我有一个字典和一个列表,看起来像这样:
key_labels = {'countries': ['usa','france','japan','china','germany'],
'fruits': ['mango', 'apple', 'passion-fruit', 'durion', 'bananna']}
docs = ["mango is a fruit that is very different from apple",
"i like to travel, last year i was in germany but i like france.it was lovely",
"mango bananna and apple are my favourite",
"apples are grown in USA",
"fruits have the best nutrients, particularly apple and mango",
"usa and germany were both in the race last year"]我想要做的是检查文档中的字符串是否存在来自key_labels的关键字(值),如果这些关键字存在,则为该句子分配一个标签,该标签基本上是来自key_labels的相应键,我可以通过执行以下操作来完成此操作:
temp = []
for s in docs:
for k, l in key_labels.items():
for w in l:
if w in s.lower():
temp.append({s:k})其输出如下所示:
#temp
[{'mango is a fruit that is very different from apple': 'fruits'},
{'mango is a fruit that is very different from apple': 'fruits'},
{'i like to travel, last year i was in germany but i like france.it was lovely': 'countries'},
{'i like to travel, last year i was in germany but i like france.it was lovely': 'countries'},
{'mango bananna and apple are my favourite': 'fruits'},
{'mango bananna and apple are my favourite': 'fruits'},
{'mango bananna and apple are my favourite': 'fruits'},
{'apples are grown in USA': 'countries'},
{'apples are grown in USA': 'fruits'},
{'fruits have the best nutrients, particularly apple and mango': 'fruits'},
{'fruits have the best nutrients, particularly apple and mango': 'fruits'},
{'usa and germany were both in the race last year': 'countries'}]正如您所看到的,从输出中可以看到,对于在句子中检测到的每个关键字,都会多次为同一句子分配标签。
但是我想要得到的输出是这样的:
{"mango is a fruit that is very different from apple": {"fruits": 2}),
"i like to travel, last year i was in germany but i like france.it was lovely":{"countries": 2},
"mango bananna and apple are my favourite":{"fruits": 3},
"apples are grown in USA": {"fruits":1, "countries":1},
"fruits have the best nutrients, particularly apple and mango":{"fruits": 2},
"usa and germany were both in the race last year":{"countries": 1}}我如何修改我的代码来实现这一点呢?
发布于 2018-08-02 18:57:05
您可以将temp设为dict,并使用dict.setdefault和dict.get方法设置外部dict和内部dict的默认值:
temp = {}
for s in docs:
for k, l in key_labels.items():
for w in l:
if w in s.lower():
temp[s][k] = temp.setdefault(s, {}).get(k, 0) + 1
print(temp)这将输出以下内容:
{'mango is a fruit that is very different from apple': {'fruits': 2}, 'i like to travel, last year i was in germany but i like france.it was lovely': {'countries': 2}, 'mango bananna and apple are my favourite': {'fruits': 3}, 'apples are grown in USA': {'countries': 1, 'fruits': 1}, 'fruits have the best nutrients, particularly apple and mango': {'fruits': 2}, 'usa and germany were both in the race last year': {'countries': 2}}https://stackoverflow.com/questions/51651659
复制相似问题