我在使用python 3.x,
我有两本字典(两本都很大,但将在这里替换)。字典的值包含多个单词:
dict_a = {'key1': 'Large left panel', 'key2': 'Orange bear rug', 'key3': 'Luxo jr. lamp'}
dict_a
{'key1': 'Large left panel',
'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'}
dict_b = {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain', 'keyZ': 'large bear musket'}
dict_b
{'keyX': 'titanium panel',
'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'}我正在寻找一种方法来比较dict_a值中包含的单个单词和dict_b值中包含的单词,并返回包含该单词的字典或数据框架,以及与之相关的dict_a和dict_b键:
我想要的输出(没有以任何特定的方式格式化):
我有在单个字典中查找特定单词的代码,但它不足以满足我在这里需要完成的任务:
def search(myDict, lookup):
aDict = {}
for key, value in myDict.items():
for v in value:
if lookup in v:
aDict[key] = value
return aDict
print (key, value)发布于 2017-01-18 19:02:48
dicts = {'a': {'key1': 'Large left panel', 'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'},
'b': {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'} }
from collections import defaultdict
index = defaultdict(list)
for dname, d in dicts.items():
for key, words in d.items():
for word in words.lower().split(): # lower() to make Orange/orange match
index[word].append((dname, key))index现在包含:
{'and' : [('b', 'keyY')],
'ball' : [('b', 'keyY')],
'bear' : [('a', 'key2'), ('b', 'keyZ')],
'chain' : [('b', 'keyY')],
'jr.' : [('a', 'key3')],
'lamp' : [('a', 'key3')],
'large' : [('a', 'key1'), ('b', 'keyZ')],
'left' : [('a', 'key1')],
'luxo' : [('a', 'key3')],
'musket' : [('b', 'keyZ')],
'orange' : [('a', 'key2'), ('b', 'keyY')],
'panel' : [('a', 'key1'), ('b', 'keyX')],
'rug' : [('a', 'key2')],
'titanium': [('b', 'keyX')] }对评论的更新
因为您的实际字典是从字符串到列表的映射(而不是字符串到字符串),所以将循环更改为
for dname, d in dicts.items():
for key, wordlist in d.items(): # changed "words" to "wordlist"
for words in wordlist: # added extra loop to iterate over wordlist
for word in words.split(): # removed .lower() since text is always uppercase
index[word].append((dname, key))因为你的清单上只有一个项目,所以你只需要做一件事情
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
index[word].append((dname, key))如果您有不希望添加到索引中的单词,则可以跳过将它们添加到index中
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}然后用
if word in words_to_skip:
continue 我注意到有些单词被括号包围(如(342)和(221))。如果你想去掉括号
if word[0] == '(' and word[-1] == ')':
word = word[1:-1]把这一切都放在一起
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
if word[0] == '(' and word[-1] == ')':
word = word[1:-1] # remove outer parenthesis
if word in words_to_skip: # skip unwanted words
continue
index[word].append((dname, key))发布于 2017-01-18 19:04:33
我想你可以很容易地做你想做的事。此代码以{word: {key: name_of_dict_the_key_is_in}}格式生成输出。
def search(**dicts):
result = {}
for name, dct in dicts.items():
for key, value in dct.items():
for word in value.split():
result.setdefault(word, {})[key] = name
return result使用输入字典调用它作为关键字参数。您为每个字典使用的关键字将是输出字典中用来描述它的字符串,因此请使用类似于search(dict_a=dict_a, dict_b=dict_b)的内容。
如果您的字典可能有一些相同的键,则此代码可能无法正常工作,因为如果它们的值中有相同的单词,则这些键可能会发生冲突。我想,您可以让外部的dict包含一个(key, name)元组的列表,而不是内部字典。只需将赋值行更改为result.setdefault(word, []).append((key, name))。不过,搜索起来就不那么方便了。
https://stackoverflow.com/questions/41726674
复制相似问题