这是我从我的数据库中提取的一个示例。我正在与作者合作进行可视化工作,所以基于这个样本,我只需要在两个作者中保持一个关系。例如,我必须删除Brian Norton中的一个- Maria Roo Ons或Maria Roo Ons-Brian Norton以保持关系的唯一性。
-------------------------------------------------------------------------------------------------
| article_title | author_name | coauthor_name |
-------------------------------------------------------------------------------------------------
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | S. Shynu
-------------------------------------------------------------------------------------------------理想的最终输出如下所示。
-------------------------------------------------------------------------------------------------
| article_title | author_name | coauthor_name |
-------------------------------------------------------------------------------------------------
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Sarah McCormack在这种情况下,我只想保留一行。我如何在R或Python中处理它?非常感谢你的帮助。
发布于 2017-11-24 02:30:09
我假设您有一个单独的数据库,并且正在使用python与其连接。
可能的方法:
1)您可以根据article列添加行号,然后执行重复数据消除。您可以查看SQL,了解如何在this中使用它。
然后,您可以使用python - db连接器运行查询
2)您可以将记录拉取到pandas数据框中并在那里进行分析。Pandas擅长处理和操纵数据。
发布于 2017-11-24 06:50:18
我假设你的数据帧看起来像我在下面展示的那样,因为你没有分享其他可能出现的可能性。
article author1 author2
A a b
A b a
A a a
A b b在R中,这就是我如何获得您要查找的行的方法。我假设您的数据帧是df1。
# This will create a new dataframe df2 with only those rows where author1 and author2 are different
df2 <- df1[df1$author1 != df1$author2, ]输出与您在问题中提供的输出类似。
article author1 author2
A a b
A b a如果这是你需要的,请告诉我。
https://stackoverflow.com/questions/47461443
复制相似问题