我正在尝试迭代在熊猫dataframe列中找到的列表,并在新的dataframe中返回包含在其他行中的三次以上匹配的结果。
,以下是数据的外观:

期望输出:

(输出是因为这些特定关键字在列表中至少在另外三行中找到)。
最小可重现性示例:
import pandas as pd
# initialize data of lists.
data = {'url': ["www.bbc.co.uk", "www.cabinzero.com", "www.cntraveller.com", "www.forbes.com", "www.gov.scot", "www.gov.uk", "www.ons.gov.uk"],
'keyword': ["['amber travel list', 'travel amber list', 'amber list countries uk travel', 'travel amber list countries', 'amber list countries travel']", "['amber list countries uk travel', 'travel amber list countries', 'amber travel list', 'travel amber list', 'amber list countries travel']", "['travel amber list', 'amber list countries uk travel', 'amber travel list', 'amber list countries travel', 'travel amber list countries']", "['amber travel list', 'travel amber list countries', 'travel amber list', 'amber list countries travel', 'amber list countries uk travel']", "['amber list countries travel', 'travel amber list countries', 'amber list countries uk travel', 'travel amber list', 'amber travel list']", "['amber list countries travel', 'amber list countries uk travel', 'amber travel list']", "['amber list countries uk travel', 'amber travel list', 'travel amber list countries', 'amber list countries travel']"]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)--我已经尝试过了--我尝试过将列表列转储到一个列表中,然后迭代来计算出现的次数,但是不能让它工作,并且不确定这是否是最好的方法。
发布于 2022-06-01 12:15:23
如果在同一个列表中,每个关键字都是唯一的,那么您可以:
from itertools import chain
listed_keywords = df.keyword.apply(lambda x: eval(x)).values # returns array of list
all_keywords = list(chain.from_iterable(listed_keywords)) # Concat all the lists into 1 global list of keywords
unique_keyword, nunique_keyword = np.unique(all_keywords, return_counts = True)# Return unique keywords and their respective frequency among all the keywords
df_keywords = pd.DataFrame(dict(keyword = unique_keyword, frequency = nunique_keyword)) # Create a DataFrame so you can easily filter according to keyword frequency.希望这回答了你的问题!
https://stackoverflow.com/questions/72461406
复制相似问题