我有一个关键字列表,我需要知道它们是否在一个列表中"access“这个词的4个单词之内。最后,我想把一个关键字与列表中特定句子的”access“匹配的次数相加起来。
当前产出:
“少数派”、“病人”、“经常”、“有”、“障碍”、“有”、“他们”、“访问”、“到”、“保健”0。
“乡间”、“病人”、“经常”、“引号”、“距离”、“as”、“a”、“屏障”、“to”、“access”、“health”、“services.‘”。
“少数人”、“病人”、“经常”、“有”、“障碍”、“与”、“他们”、“准入”、“到”、“保健”0。
“少数人”、“病人”、“经常”、“有”、“障碍”、“与”、“他们”、“准入”、“到”、“保健”。
期望产出:
“少数人”、“病人”、“经常”、“有”、“障碍”、“与”、“他们”、“准入”、“到”、“保健”。
我,我,一个,一个,渴望,用户,微软,Access,数据库“0
“农村”、“病人”、“经常”、“引证”、“距离”、“as”、“a”、“屏障”、“to”、“access”、“保健”、“服务”。
accessdesc = ["care", "services", "healthcare", "barriers"]
sentences = ["Minority patients often have barriers with their access to
healthcare.", "I am an avid user of Microsoft Access databases", "Rural
patients often cite distance as one of the barriers to access healthcare
services."]
for sentence in sentences:
nummatches = 0
for desc in accessdesc:
sentence = sentence.replace(".","") if "." in sentence else ''
sentence = sentence.replace(",","") if "," in sentence else ''
if 'access' in sentence.lower() and desc in sentence.lower():
sentence = sentence.lower().split()
access_position = sentence.index('access') if "access" in
sentence else 0
desc_position = sentence.index(desc) if desc in sentence else 0
if abs(access_position - desc_position) < 5 :
nummatches = nummatches + 1
else:
nummatches = nummatches + 0
print(sentence, nummatches)发布于 2019-06-18 15:15:14
我认为您需要将循环的顺序从以下位置切换:
for desc in accessdesc:
for sentence in sentences: 至:
for sentence in sentences:
nummatches = 0 # Resets the count to 0 for each sentence
for desc in accessdesc: 这意味着在进入下一个句子之前,您可以检查每个单词是否在一个句子中。然后将print(sentence, nummatches)语句移到第二个循环之外,以便在每个句子之后打印结果。
其他需要查看的是行if 'access' and desc in sentence :。and将表达式组合到左侧和表达式的右边,并检查它们都是计算到True的。这意味着它正在检查access == True is True和desc in sentence。这里您想要的是检查access和desc是否都在sentance。我还建议忽略此检查的大小写,因为'access'不等于'Access'。所以你可以重写这个
if 'access' in sentence.lower() and desc in sentence.lower():
sentence = sentence.lower().split()现在,因为您正在检查if条件下的句子中的desc,所以不必再检查,就像您在注释中提到的那样。
请注意,如果access或关键字之一在句子中只出现一次或更少,您的代码很可能会按预期工作,因为sentence.index()只会发现字符串的第一次出现。它需要额外的逻辑来处理多个字符串。
编辑
因此,如果句子中不存在标点符号,那么替换标点符号的行(例如sentence = sentence.replace(".","") if "." in sentence else '' )将把句子设置为''。您可以在一行中完成所有替换,然后根据列表而不是句子字符串进行检查。此外,您还需要检查拆分列表中存在的单词,而不是字符串中的单词,因此它只匹配整个单词。
'it' in 'bit'
>>> True
'it' in ['bit']
>>> False因此,您可以将代码重写为:
for sentence in sentences:
nummatches = 0
words = sentence.replace(".","").replace(",","").lower().split()
# moved this outside of the second loop as the sentence doesn't change through the iterations
# Not changing the sentence variable so can print in it's original form
if 'access' not in words:
continue # No need to proceed if access not in the sentence
for desc in accessdesc:
if desc not in words:
continue # Can use continue to go to the next iteration of the loop
access_position = words.index('access')
desc_position = words.index(desc)
if abs(access_position - desc_position) < 5 :
nummatches += 1
# else statement not required
print(sentence, nummatches) # moved outside of the second loop so it prints after checking through all the words如前所述,只有当'access‘或其中一个关键字只出现在句子中一次或更少时,这才能起作用。如果它们出现不止一次,则使用index()只会找到第一次出现。看看this answer,看看是否可以在代码中使用解决方案。还请看一看如何从字符串中去掉标点符号的this answer。
https://stackoverflow.com/questions/56651798
复制相似问题