因此,我需要一个简单的方法,从一个段落中的搜索词前后提取10个单词,并将其全部提取到一个句子中。
示例:
家犬(犬属)是犬属的一员,是狼类犬类的一部分,也是数量最丰富的食肉动物。狗和现存的灰狼是姐妹分类群,现代狼与第一次驯养的狼没有密切关系,这意味着狗的直接祖先已经灭绝。这只狗是第一批驯养的犬种,经过几千年的选择性繁殖,以适应各种行为、感官能力和身体属性。
输入
沃尔夫
输出
最丰富的食肉动物。狗和现存的灰狼是姐妹类群,现代狼与之没有密切关系。
发布于 2017-08-10 14:04:14
paragraph = 'The domestic dog (Canis lupus familiaris or Canis familiaris) is a member of genus Canis (canines) that forms part of the wolf-like canids, and is the most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first domesticated species and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes.'
word = "wolf"
wordlist = paragraph.split(" ")
index = wordlist.index(word)
first_part = wordlist[index-10:index]
second_part = wordlist[index:index+11]
print("%s %s" % (" ".join(first_part), " ".join(second_part)))输出:
most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to发布于 2017-08-10 14:23:55
这是一个正则表达式,可以帮助您提取所需的文本:
(?:[^ ]+ ){0,10}wolf(?: [^ ]+){0,10}此外,python示例应该类似于,尽管我现在无法测试它:
import re
t = "The domestic dog (Canis lupus familiaris or Canis familiaris) is a member of genus Canis (canines) that forms part of the wolf-like canids, and is the most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first domesticated species and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes"
m = re.search("(?:[^ ]+ ){0,10}wolf\s(?:[^ ]+ ){0,10}", t)
if m:
print (m.group(0))发布于 2017-08-10 14:00:44
在找到目标单词的位置后,可以尝试使用子字符串。到目前为止,你试过编码什么吗?
https://stackoverflow.com/questions/45615840
复制相似问题