首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在python中,我如何在特定的单词周围拉出一些单词?

在python中,我如何在特定的单词周围拉出一些单词?
EN

Stack Overflow用户
提问于 2017-08-10 13:57:34
回答 3查看 365关注 0票数 0

因此,我需要一个简单的方法,从一个段落中的搜索词前后提取10个单词,并将其全部提取到一个句子中。

示例:

家犬(犬属)是犬属的一员,是狼类犬类的一部分,也是数量最丰富的食肉动物。狗和现存的灰狼是姐妹分类群,现代狼与第一次驯养的狼没有密切关系,这意味着狗的直接祖先已经灭绝。这只狗是第一批驯养的犬种,经过几千年的选择性繁殖,以适应各种行为、感官能力和身体属性。

输入

沃尔夫

输出

最丰富的食肉动物。狗和现存的灰狼是姐妹类群,现代狼与之没有密切关系。

EN

回答 3

Stack Overflow用户

发布于 2017-08-10 14:04:14

代码语言:javascript
复制
paragraph = 'The domestic dog (Canis lupus familiaris or Canis familiaris) is a member of genus Canis (canines) that forms part of the wolf-like canids, and is the most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first domesticated species and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes.'
word = "wolf"
wordlist = paragraph.split(" ")

index = wordlist.index(word)
first_part = wordlist[index-10:index]
second_part = wordlist[index:index+11]
print("%s %s" % (" ".join(first_part), " ".join(second_part)))

输出:

代码语言:javascript
复制
most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to
票数 4
EN

Stack Overflow用户

发布于 2017-08-10 14:23:55

这是一个正则表达式,可以帮助您提取所需的文本:

代码语言:javascript
复制
(?:[^ ]+ ){0,10}wolf(?: [^ ]+){0,10}

此外,python示例应该类似于,尽管我现在无法测试它:

代码语言:javascript
复制
import re

t = "The domestic dog (Canis lupus familiaris or Canis familiaris) is a member of genus Canis (canines) that forms part of the wolf-like canids, and is the most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first domesticated species and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes"

m = re.search("(?:[^ ]+ ){0,10}wolf\s(?:[^ ]+ ){0,10}", t)

if m:
    print (m.group(0))
票数 2
EN

Stack Overflow用户

发布于 2017-08-10 14:00:44

在找到目标单词的位置后,可以尝试使用子字符串。到目前为止,你试过编码什么吗?

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/45615840

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档