我有两个列表,每个列表都是String的集合,希望检查list(A)的一个项是否存在于另一个list(B)项中。因此,在list(A)中有一些标准--应该在list(B)中找到单词和短语。我用这个List(A)填充了(e.g. "innovation", "innovative", "new ways to go"),lemmatized it (['innovation'], ['innovative'], ['new', 'way', 'go']。
在list(B)中有文本('time', new', 'way', 'go']的tokenized和lemmatized语句。
在这个模式中,我试图分析文本中是否和多久出现给定的单词和短语。
为了匹配我所读到的模式,需要将每个list元素本身转换为一个字符串,以检查它是否是list(b)中字符串的子字符串。
list_a = [['innovation'], ['innovative'], ['new', 'way', 'go'], ['set', 'trend']]
list_b = [['time', 'innovation'], ['time', 'go', 'new', 'way'], ['look', 'innovative', 'creative', 'people']]
for x in range(len(list_a)):
for j in range(len(list_b)):
a = " ".join(list_a[x])
if any(a in s for s in list_b[j]):
print("word of list a: ", a, " appears in list b: ", list_b[j]) `实际产出如下:
word of list a: innovation appears in list b: ['time', 'innovation']
word of list a: innovative appears in list b: ['look', 'innovative', 'creative', 'people']我的目标输出是:
word of list a: innovation appears in list b: ['time', 'innovation']
word of list a: innovative appears in list b: ['look', 'innovative', 'creative', 'people']
word of list a: new way go appears in list b: ['time', 'go', 'new', 'way']将list(b)的项转换为字符串(就像我尝试使用list(a)那样)无助于我。
谢谢你的帮忙!
发布于 2019-07-31 08:56:01
第一个错误是:不要从单词列表中创建字符串。使用单词的set和set方法(此处:issubset)
list_b的一个集合中(如果不使用any,否则我们无法知道哪个集合包含当前的集合,一个简单的循环就可以了)如下所示:
list_a = [['innovation'], ['innovative'], ['new', 'way', 'go'], ['set', 'trend']]
list_b = [['time', 'innovation'], ['time', 'go', 'new', 'way'], ['look', 'innovative', 'creative', 'people']]
list_a = [set(x) for x in list_a]
list_b = [set(x) for x in list_b]
for subset in list_a:
for other_subset in list_b:
if subset.issubset(other_subset):
print("{} appears in list b: {}".format(subset,other_subset))指纹:
{'innovation'} appears in list b: {'time', 'innovation'}
{'innovative'} appears in list b: {'look', 'creative', 'innovative', 'people'}
{'new', 'go', 'way'} appears in list b: {'time', 'new', 'go', 'way'}现在,如果您希望保留顺序,但仍然希望从set用于元素测试的优点中获益,只需为list_b创建元组列表,因为它已经迭代了几次。不需要对list_a执行同样的操作,因为它只迭代一次:
# list_a is now unchanged
list_b = [(set(x),x) for x in list_b]
for sublist in list_a:
subset = set(sublist)
for other_subset,other_sublist in list_b:
if subset.issubset(other_subset):
print("{} appears in list b: {}".format(sublist,other_sublist))结果:
['innovation'] appears in list b: ['time', 'innovation']
['innovative'] appears in list b: ['look', 'innovative', 'creative', 'people']
['new', 'way', 'go'] appears in list b: ['time', 'go', 'new', 'way']算法仍然很昂贵:O(n**3),但不是O(n**4),这要归功于O(n)集查找(与列表查找相比),以测试另一个单词列表中是否包含了一个单词列表。
发布于 2019-07-31 09:11:59
假设当a中的一个列表中的所有单词都包含在B中的一个列表中时,您只想匹配,那么可以使用。
list_a = [['innovation'], ['innovative'], ['new', 'way', 'go'], ['set', 'trend']]
list_b = [['time', 'innovation'], ['time', 'go', 'new', 'way'], ['look', 'innovative', 'creative', 'people'], ['way', 'go', 'time']]
for a_element in list_a:
for b_element in list_b:
for a_element_item in a_element:
if a_element_item not in b_element:
break
else:
print(a_element, "is in ", b_element)输出
['innovation'] is in ['time', 'innovation']
['innovative'] is in ['look', 'innovative', 'creative', 'people']
['new', 'way', 'go'] is in ['time', 'go', 'new', 'way']https://stackoverflow.com/questions/57286611
复制相似问题