文章/答案/技术大牛

发布

社区首页 >问答首页 >熊猫:查找包含多个特定单词的数据帧字符串条目。

问熊猫:查找包含多个特定单词的数据帧字符串条目。
EN

Stack Overflow用户

提问于 2021-09-17 09:45:15

回答 3查看 307关注 0票数 0

的情况：

我有一个熊猫DataFrame，并希望找到所有的条目与一个字符串，其中包含一个特定的单词不止一次，并创建一个独立的数据与上述结果。

我做了什么？

到目前为止，我已经成功地让它收集了至少一次包含指定单词的所有条目。

代码

    import pandas as pd
    df = pd.DataFrame({'Year': ['2020', '2021', '2021'],
                       'Title': ['Energy calculation', 'Energy calculation with energy', 'Other calculation'])
    terms = ['energy']
    list_df = selection_df[selection_df['title'].str.contains('|'.join(terms), na=False, case=False)]

输出：

0 2020能源计算12021能源计算

然后质疑

我只想收集第二个条目：

12021能量计算

它不止一次包含能量这个词。我怎么能这么做？

python

pandas

dataframe

回答 3

Stack Overflow用户

回答已采纳

发布于 2021-09-17 09:52:08

您需要在Series.str.count中分别测试列表的每个值，以获取掩码列表，然后使用np.logical_or.reduce

import re

terms = ['energy']
masks = [selection_df['Title'].str.count(re.escape(x), flags=re.I).gt(1) for x in terms]
list_df = selection_df[np.logical_or.reduce(masks)]
print (list_df)
 Year                           Title
1  2021  Energy calculation with energy

备选解决办法：

terms = ['energy']
masks = [selection_df['Title'].str.count(re.escape(x), flags=re.I).gt(1) for x in terms]
list_df = selection_df[pd.concat(masks, axis=1).any(axis=1)]

票数 2

Stack Overflow用户

发布于 2021-09-17 11:20:51

可以在捕获组和引用中使用regex：

import re
reg = r'.*(%s).*\1' % '|'.join(terms)
# line above constructs reg = '.*(energy|other|terms).*\\1'

selection_df[selection_df['Title'].str.match(reg, flags=re.I)]

产出：

   Year                           Title
1  2021  Energy calculation with energy

票数 2

Stack Overflow用户

发布于 2021-09-17 09:50:32

您可以将.str.extractall与collections.Counter结合使用

import re
from collections import Counter

terms = ["energy", "calculation"]

x = (
    df["Title"]
    .str.extractall("(" + "|".join(map(re.escape, terms)) + ")", flags=re.I)
    .groupby(level=0)
    .agg(lambda x: Counter(map(str.lower, x)).most_common(1)[0][1])
)
print(df[x[0] > 1])

指纹：

   Year                           Title
1  2021  Energy calculation with energy

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69221230

复制

相似问题

问熊猫:查找包含多个特定单词的数据帧字符串条目。
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:查找包含多个特定单词的数据帧字符串条目。EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:查找包含多个特定单词的数据帧字符串条目。
EN