我有一个有很多行和一些低频值的数据。我需要进行逐行计数,然后更改频率小于3时的值。
DF-输入
Col1 Col2 Col3 Col4
1 apple tomato apple
1 apple potato nan
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 grape tomato banana
1 pear tomato banana
1 lemon tomato burgerDF-输出
Col1 Col2 Col3 Col4
1 apple tomato Other
1 apple Other nan
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 Other tomato banana
1 Other tomato banana
1 Other tomato Other发布于 2018-01-30 22:10:41
将where与value_counts结合使用
df.where(df.apply(lambda x: x.groupby(x).transform('count')>2), 'Other')输出:
Col2 Col3 Col4
Col1
1 apple tomato Other
1 apple Other banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 Other tomato banana
1 Other tomato banana
1 Other tomato Other更新:处理原始数据文件中的NaN:
d = df.apply(lambda x: x.groupby(x).transform('count'))
df.where(d.gt(2.0).where(d.notnull()).astype(bool), 'Other')输出:
Col2 Col3 Col4
Col1
1 apple tomato Other
1 apple Other NaN
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 Other tomato banana
1 Other tomato banana
1 Other tomato Otherhttps://stackoverflow.com/questions/48531236
复制相似问题