我有一个df,它包含列、产品id和产品名称。产品名称列被标记化并以列表格式显示。例如:
Product id Product name
1 [land, cruiser]
1 [land, cruiser]
1 [land, cruiser, toyota]
1 [land, cruiser]
1 [land, toyota]search_word = [land, cruiser]和我希望选择search_word中所有元素都存在的所有行。所以,结果应该是
Product id Product name
1 [land, cruiser]
1 [land, cruiser]
1 [land, cruiser]目前,我编写了以下代码:
has_all = data[
data['Product name'].apply(lambda x: np.all([*map(lambda l: l in x, search_words)]))]如何加快这一行(大约4.2行)?在这种情况下,使用re库作为列表工作更快,还是作为字符串使用?
发布于 2022-04-14 13:36:59
您可以使用set操作和列表理解。
假设这一输入:
df = pd.DataFrame({'Product id': [1, 1, 1, 1, 1],
'Product name': [['land', 'cruiser'],
['land', 'cruiser'],
['land', 'cruiser', 'toyota'],
['land', 'cruiser'],
['land', 'toyota']]})
# use a set instead of a list!
search_word = set(['land', 'cruiser'])您可以使用:
df2 = df[[search_word == set(x) for x in df['Product name']]]产出:
Product id Product name
0 1 [land, cruiser]
1 1 [land, cruiser]
3 1 [land, cruiser]https://stackoverflow.com/questions/71872431
复制相似问题