我有一只有两栏的熊猫数据。report_tags是逗号分隔的单词,t_f是表示是或否的标志(1或0)。我想用t_f把这些逗号分隔开的单词和组分开。然后将tag/t_f分组相加到一个名为count的新列中
df
report_tags t_f
0 bec,eac,fbi,ic3,scam 1
1 dlink,router,wifi 0
2 adobe 0
3 bec, fbi 1
4 bec, fbi, scam 0期望产出:
df2
tag t_f count
0 bec 1 2
1 eac 1 1
2 fbi 1 2
3 ic3 1 1
4 scam 1 1
5 dlink 0 1
6 router 0 1
7 wifi 0 1
8 adobe 0 1
9 bec 0 1
10 fbi 0 1
11 scam 0 1发布于 2019-09-20 18:59:06
使用str.split + explode
k = dict(sort=False)
(df.set_index('t_f')['report_tags']
.str.split(r',\s*').explode()
.groupby(level=0, **k).value_counts(**k)
.rename('count').reset_index())
t_f report_tags count
0 1 bec 2
1 1 eac 1
2 1 fbi 2
3 1 ic3 1
4 1 scam 1
5 0 adobe 1
6 0 bec 1
7 0 dlink 1
8 0 fbi 1
9 0 router 1
10 0 scam 1
11 0 wifi 1https://stackoverflow.com/questions/58033792
复制相似问题