首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >关键字匹配和关键字计数

关键字匹配和关键字计数
EN

Stack Overflow用户
提问于 2018-08-02 18:45:19
回答 1查看 50关注 0票数 2

我有一个字典和一个列表,看起来像这样:

代码语言:javascript
复制
key_labels = {'countries': ['usa','france','japan','china','germany'], 
              'fruits': ['mango', 'apple', 'passion-fruit', 'durion', 'bananna']}

docs = ["mango is a fruit that is very different from apple", 
        "i like to travel, last year i was in germany but i like france.it was lovely", 
        "mango bananna and apple are my favourite", 
        "apples are grown in USA", 
        "fruits have the best nutrients, particularly apple and mango", 
       "usa and germany were both in the race last year"]

我想要做的是检查文档中的字符串是否存在来自key_labels的关键字(值),如果这些关键字存在,则为该句子分配一个标签,该标签基本上是来自key_labels的相应键,我可以通过执行以下操作来完成此操作:

代码语言:javascript
复制
temp = []
for s in docs:
    for k, l in key_labels.items(): 
        for w in l:
            if w in s.lower():
                temp.append({s:k})

其输出如下所示:

代码语言:javascript
复制
#temp
[{'mango is a fruit that is very different from apple': 'fruits'},
 {'mango is a fruit that is very different from apple': 'fruits'},
 {'i like to travel, last year i was in germany but i like france.it was lovely': 'countries'},
 {'i like to travel, last year i was in germany but i like france.it was lovely': 'countries'},
 {'mango bananna and apple are my favourite': 'fruits'},
 {'mango bananna and apple are my favourite': 'fruits'},
 {'mango bananna and apple are my favourite': 'fruits'},
 {'apples are grown in USA': 'countries'},
 {'apples are grown in USA': 'fruits'},
 {'fruits have the best nutrients, particularly apple and mango': 'fruits'},
 {'fruits have the best nutrients, particularly apple and mango': 'fruits'},
 {'usa and germany were both in the race last year': 'countries'}]

正如您所看到的,从输出中可以看到,对于在句子中检测到的每个关键字,都会多次为同一句子分配标签。

但是我想要得到的输出是这样的:

代码语言:javascript
复制
{"mango is a fruit that is very different from apple": {"fruits": 2}), 
 "i like to travel, last year i was in germany but i like france.it was lovely":{"countries": 2}, 
 "mango bananna and apple are my favourite":{"fruits": 3}, 
 "apples are grown in USA": {"fruits":1, "countries":1}, 
 "fruits have the best nutrients, particularly apple and mango":{"fruits": 2}, 
"usa and germany were both in the race last year":{"countries": 1}}

我如何修改我的代码来实现这一点呢?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-08-02 18:57:05

您可以将temp设为dict,并使用dict.setdefaultdict.get方法设置外部dict和内部dict的默认值:

代码语言:javascript
复制
temp = {}
for s in docs:
    for k, l in key_labels.items():
        for w in l:
            if w in s.lower():
                temp[s][k] = temp.setdefault(s, {}).get(k, 0) + 1
print(temp)

这将输出以下内容:

代码语言:javascript
复制
{'mango is a fruit that is very different from apple': {'fruits': 2}, 'i like to travel, last year i was in germany but i like france.it was lovely': {'countries': 2}, 'mango bananna and apple are my favourite': {'fruits': 3}, 'apples are grown in USA': {'countries': 1, 'fruits': 1}, 'fruits have the best nutrients, particularly apple and mango': {'fruits': 2}, 'usa and germany were both in the race last year': {'countries': 2}}
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51651659

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档