我有一个看起来像这样的DataFrame:
df = pd.DataFrame({'ID':[1,1,2,2,3,4],'Name':['John Doe','Jane Doe','John Smith','Jane Smith','Jack Hill','Jill Hill']})
ID Name
0 1 John Doe
1 1 Jane Doe
2 2 John Smith
3 2 Jane Smith
4 3 Jack Hill
5 4 Jill Hill然后,我按ID添加了另一列分组,并取名称中的唯一值:
df['Multi Name'] = df.groupby('ID')['Name'].transform('unique')
ID Name Multi Name
0 1 John Doe [John Doe, Jane Doe]
1 1 Jane Doe [John Doe, Jane Doe]
2 2 John Smith [John Smith, Jane Smith]
3 2 Jane Smith [John Smith, Jane Smith]
4 3 Jack Hill [Jack Hill]
5 4 Jill Hill [Jill Hill]如何从多个名称中删除括号?
我试过:
df['Multi Name'] = df['Multi Name'].str.strip('[]')
ID Name Multi Name
0 1 John Doe NaN
1 1 Jane Doe NaN
2 2 John Smith NaN
3 2 Jane Smith NaN
4 3 Jack Hill NaN
5 4 Jill Hill NaN期望产出:
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill发布于 2018-05-16 20:14:51
transform
df.join(df.groupby('ID').Name.transform('unique').rename('Multi Name'))
ID Name Multi Name
0 1 John Doe [John Doe, Jane Doe]
1 1 Jane Doe [John Doe, Jane Doe]
2 2 John Smith [John Smith, Jane Smith]
3 2 Jane Smith [John Smith, Jane Smith]
4 3 Jack Hill [Jack Hill]
5 4 Jill Hill [Jill Hill]df.join(df.groupby('ID').Name.transform('unique').str.join(', ').rename('Multi Name'))
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hillmap
df.join(df.ID.map(df.groupby('ID').Name.unique().str.join(', ')).rename('Multi Name'))
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hillitertools.groupby
from itertools import groupby
d = {
k: ', '.join(x[1] for x in v)
for k, v in groupby(sorted(set(zip(df.ID, df.Name))), key=lambda x: x[0])
}
df.join(df.ID.map(d).rename('Multi Name'))
ID Name Multi Name
0 1 John Doe Jane Doe, John Doe
1 1 Jane Doe Jane Doe, John Doe
2 2 John Smith Jane Smith, John Smith
3 2 Jane Smith Jane Smith, John Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill发布于 2018-05-16 20:13:58
在这里,unique似乎是函数的错误选择。我推荐一个使用str.join的定制lambda函数
df['Multi Name'] = df.groupby('ID')['Name'].transform(lambda x: ', '.join(set(x)))
df
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith Jane Smith, John Smith
3 2 Jane Smith Jane Smith, John Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill发布于 2018-05-16 20:17:09
使用map和join
df['Multi Name'] = df.groupby('ID')['Name'].transform('unique').map(', '.join)输出:
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hillhttps://stackoverflow.com/questions/50379203
复制相似问题