我有一个数据集;
>>> all_transcripts
ID Type Name
1 Guest Hugo
1 Guest Hugo
1 Boss Boss
1 Boss Boss
2 Boss Boss
2 Guest Calvin
2 Guest Calvin
3 Guest Klein
3 Boss Boss现在,我希望创建一个名为nameGuest的列,该列包含每行每个ID的来宾名称。因此,我想要的输出如下:
>>> all_transcripts
ID Type Name nameGuest
1 Guest Hugo Hugo
1 Guest Hugo Hugo
1 Boss Boss Hugo
1 Boss Boss Hugo
2 Boss Boss Calvin
2 Guest Calvin Calvin
2 Guest Calvin Calvin
3 Guest Klein Klein
3 Boss Boss Klein我该怎么做?
发布于 2019-04-30 08:45:04
使用Series.map by helper Series (由boolean indexing、DataFrame.drop_duplicates和DataFrame.set_index创建)获取每个组Guest的第一个值:
s = df[df['Type'] == 'Guest'].drop_duplicates('ID').set_index('ID')['Name']
df['nameGuest'] = df['ID'].map(s)
print (df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein发布于 2019-04-30 08:53:40
Groupby.first
您可以在Type=Guest上使用Type=Guest筛选器之前使用first,并在聚合时选择first名称。
这将得到具有相应ID的名称。因此,我们可以将其映射回我们的dataframe并创建新列:
names = df[df['Type'] == 'Guest'].groupby('ID')['Name'].first()
df['nameGuest'] = df['ID'].map(names)print(df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Kleinnames的输出
print(names)
ID
1 Hugo
2 Calvin
3 Klein
Name: Name, dtype: objecthttps://stackoverflow.com/questions/55917020
复制相似问题