文章/答案/技术大牛

发布

社区首页 >问答首页 >Pandas:创建统计列表中关键字/短语在一列中出现次数的新数据帧

问Pandas:创建统计列表中关键字/短语在一列中出现次数的新数据帧
EN

Stack Overflow用户

提问于 2018-07-21 07:23:51

回答 2查看 33关注 0票数 1

我有以下单词表：

list =‘堵塞的排水沟’，‘右翼’，‘马’，‘鸟’，‘碰撞灯’

我有以下数据框(注意间距可能很奇怪)：

ID  TEXT               
1   you have   clogged   drain     
2   the dog   has a right wing   clogged drain     
3   the  bird flew  into collision light       
4   the horse is here to horse   around   
5   bird    bird bird

我想创建一个表，显示关键字以及关键字在文本字段中出现的频率计数。但是，如果一个关键字在文本列的同一行中出现多次，则只计算一次。

所需输出：

keywords         count
clogged drain    2
right wing       1
horse            1
bird             2
collision light  1

我已经搜索了整个stackoverflow，但没有找到我的具体案例。

count

frequency

list

pandas

回答 2

Stack Overflow用户

发布于 2018-07-21 07:41:24

首先，我将使用str.split()和str.join()重新格式化TEXT列，以消除有趣的空格。然后，为您的每个关键字使用str.contains，并获取输出的布尔值的总和(如果找到您的关键字，则返回True )：

# Reformat text, splitting wherever you have one or more spaces
df['formatted_text'] = df.TEXT.str.split('\s+').str.join(' ')

# create your output dataframe
df2 = pd.DataFrame(my_list, columns=['keywords'])

# Count occurences:
df2['count'] = df2['keywords'].apply(lambda x: df.formatted_text.str.contains(x).sum())

结果是：

>>> df2
          keywords  count
0    clogged drain      2
1       right wing      1
2            horse      1
3             bird      2
4  collision light      1

注意，我将list的变量名改为my_list，这样就不会屏蔽内置的python数据类型

票数 0

Stack Overflow用户

发布于 2018-07-21 08:47:00

您可以使用extractall

df.TEXT.str.extractall(r'({})'.format('|'.join(list)))[0].str.get_dummies().sum(level=0).gt(0).astype(int).sum()
Out[225]: 
bird               2
clogged drain      2
collision light    1
horse              1
right wing         1
dtype: int64

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51451493

复制

相似问题

问Pandas:创建统计列表中关键字/短语在一列中出现次数的新数据帧
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas:创建统计列表中关键字/短语在一列中出现次数的新数据帧EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas:创建统计列表中关键字/短语在一列中出现次数的新数据帧
EN