我是Python的新手,但我正在帮助一个项目,以确定数据中有偏见的单词的数量。
我现在有一个密码词的列表:
male_coded_words = ['active','adventurous','aggress','ambitio','analy','assert']
我有一本职称和技能的字典:
jobsdict = {'fork lift truck driver': ['fork lift truck driv','assert], 'assistant fraud and payment risk manager': ['fraud', 'online fraud', 'fraud detect', 'payment system', 'risk manag'], 'paralegal vacancy corporate immigration (london office)': ['legal', 'microsoft offic', 'communication skil'], 'transport operator': ['transport','active'], 'year 5 primary teacher': ['newham'], 'multi agency safeguarding administrator': ['admin', 'social work', 'safeguard', 'social work admin', 'children administr', 'social work administr', 'safeguarding administr']
我想遍历字典,找出每个键在male_coded_words列表中出现的次数。
{'fork lift truck driver': "count":"1", "coded_words":["assert"].....}形式的字典形式的输出
到目前为止我的代码;
final_count = 0
final_output = {}
for k, v in jobsdict:
final_output[k] = []
if 'analy' in str(v):
n = final_count + 1
else:
n = 0
final_output[k].append(n)
final_output[k].append(v)发布于 2021-03-14 01:06:42
这里的一个好主意是利用Python的set对象,它充当列表的无序替代品。集合上的操作往往比列表上的等价操作快得多。为简洁起见,我还使用了一个dictionary comprehension和一个Counter对象来自动计算编码的words.The的实例数,下面的脚本应该会给出您指定的输出:
from collections import Counter
# General form of the data provided above, for reference
# male_coded_words = ['active', ...]
# jobsdict = {'fork lift truck driver': ['fork lift truck driv','assert'], ...}
result = {k: Counter(set(v) & set(male_coded_words)) for k,v in jobsdict.items()}
# result will look like {'fork lift truck driver': {"assert": 1}, ...}.
# If no coded words exist for a specific job, then its value in the
# result dict will just be an empty set.发布于 2021-03-14 20:00:15
我是一个初学者,但我会研究正则表达式并对其进行计数(这是我能想到的最简单的方法)。
发布于 2021-03-14 20:25:08
除了使用少于2的for循环之外,我想不出其他方法,一个用于迭代jobsdict,另一个用于编码的单词。另外,使用jobsdict.items()通过键和值对其进行迭代:
final_count = 0
final_output = {}
for k, v in jobsdict.items():
count, words = 0, []
s = ''.join(v)
# merge all the strings into one to avoid a third nested loop iterating over them
for w in male_coded_words:
c = s.count(w)
# can be replaced with `w in s` if you don't want to count multiple occurrences of a word each time
if c:
count += c
words.append(w)
final_count += count
final_output[k] = [count, words]
print(final_output, final_count)这给了我以下输出:
{'fork lift truck driver': [1, ['assert']], 'assistant fraud and payment risk manager': [0, []], 'paralegal vacancy corporate immigration (london office)': [0, []], 'transport operator': [1, ['active']], 'year 5 primary teacher': [0, []], 'multi agency safeguarding administrator': [0, []]} 2编辑:如果希望final_output中包含字典,请将倒数第二行替换为final_output[k] = {"count":count, "words":words}
https://stackoverflow.com/questions/66615453
复制相似问题