我得到了带有列的dataframe和一组重复值。我想要的是在这样的专栏中只保留第一项。
我尝试过df = df.groupby(['author', 'key']),但不知道如何正确地获取所有行。使用df.first(),将只打印第一行。
import pandas as pd
lst = [
['juli', 'JIRA-1', 'assignee'],
['juli', 'JIRA-1', 'assignee'],
['nick', 'JIRA-1', 'timespent'],
['nick', 'JIRA-3', 'status'],
['nick', 'JIRA-3', 'assignee'],
['tom', 'JIRA-1', 'comment'],
['tom', 'JIRA-1', 'assignee'],
['tom', 'JIRA-2', 'status']]
df = pd.DataFrame(lst, columns =['author', 'key', 'field'])
#df = df.sort_values(by=['author', 'key'])
>>> df
author key field
0 juli JIRA-1 assignee
1 juli JIRA-1 assignee
2 nick JIRA-1 timespent
3 nick JIRA-3 status
4 nick JIRA-3 assignee
5 tom JIRA-1 comment
6 tom JIRA-1 assignee
7 tom JIRA-2 status我得到的是:
>>> df.groupby(['author', 'key']).first()
field
author key
juli JIRA-1 assignee
nick JIRA-1 timespent
JIRA-3 status
tom JIRA-1 comment
JIRA-2 status我想要的:
juli JIRA-1 assignee
assignee
nick JIRA-1 timespent
JIRA-3 status
assignee
tom JIRA-1 comment
assignee
JIRA-2 status发布于 2019-07-24 15:40:32
看起来,您需要df.duplicated()来查找重复项,需要df.loc[]来分配空白:
df.loc[df.duplicated(['author','key']),['author','key']]=''
print(df) author key field
0 juli JIRA-1 assignee
1 assignee
2 nick JIRA-1 timespent
3 nick JIRA-3 status
4 assignee
5 tom JIRA-1 comment
6 assignee
7 tom JIRA-2 statushttps://stackoverflow.com/questions/57186587
复制相似问题