我有一个数据帧df
Name Reagent
0 Experiment1 water
1 Experiment1 oil
2 Experiment1 water
3 Experiment1 milk
4 Experiment1 water
5 Experiment1 tea
6 Experiment1 water
7 Experiment1 coffee
8 Experiment2 water
9 Experiment2 coffee我想用某种不同的方式替换相同实验中的重复名称。在这个例子中,在一个给定的实验中,只有水是重复的。
e.g
Name Reagent
0 Experiment1 water1
1 Experiment1 oil
2 Experiment1 water2
3 Experiment1 milk
4 Experiment1 water3
5 Experiment1 tea
6 Experiment1 water4
7 Experiment1 coffee
8 Experiment2 water
9 Experiment2 coffee谢谢你的帮助
发布于 2019-04-03 20:11:19
解决方案:使用GroupBy.cumcount作为计数器附加所有值(并将0值替换为空字符串,以忽略每个第一个副本):
df['Reagent'] += df.groupby(['Name','Reagent']).cumcount().astype(str).replace('0','')
print (df)
Name Reagent
0 Experiment1 water
1 Experiment1 oil
2 Experiment1 water1
3 Experiment1 milk
4 Experiment1 water2
5 Experiment1 tea
6 Experiment1 water3
7 Experiment1 coffee
8 Experiment2 water
9 Experiment2 coffee如果需要仅按两列替换所有重复项,请按两列的DataFrame.duplicated筛选行,然后添加1
mask = df.duplicated(['Name','Reagent'], keep=False)
df.loc[mask, 'Reagent'] += df[mask].groupby(['Name','Reagent']).cumcount().add(1).astype(str)
print (df)
Name Reagent
0 Experiment1 water1
1 Experiment1 oil
2 Experiment1 water2
3 Experiment1 milk
4 Experiment1 water3
5 Experiment1 tea
6 Experiment1 water4
7 Experiment1 coffee
8 Experiment2 water
9 Experiment2 coffeehttps://stackoverflow.com/questions/55494824
复制相似问题