文章/答案/技术大牛

发布

社区首页 >问答首页 >我想用Python2.7中的长字符串(段落)提取一定数量的围绕给定单词的单词

问我想用Python2.7中的长字符串(段落)提取一定数量的围绕给定单词的单词
EN

Stack Overflow用户

提问于 2017-04-17 10:33:19

回答 5查看 1.9K关注 0票数 2

我正在试图提取一个特定单词周围的选定的数目。我会举例说明：

“教育应着眼于人的个性的充分发展和加强对人权和基本自由的尊重”。

( 1)选择的单词是development，我需要得到围绕它的6个单词，并得到: to，full，of，human。

2)但如果所选单词位于开头或第二位，我仍然需要得到6个单词，例如：

选好的词是，我应该得到:教育，被，指导，给，完全

我应该使用“re”模块。到目前为止，我找到的是：

def search(text,n):
'''Searches for text, and retrieves n words either side of the text, which are retuned seperatly'''
word = r"\W*([\w]+)"
groups = re.search(r'{}\W*{}{}'.format(word*n,'place',word*n), text).groups()
return groups[:n],groups[n:]

但它只帮了我处理第一个案子。如果有人能帮我解决这个问题，我会非常感激的。提前谢谢你！

extract

words

python

python-2.7

numbers

回答 5

Stack Overflow用户

回答已采纳

发布于 2017-04-17 11:07:02

这将提取文本中所有出现的目标单词，并附带上下文：

import re

text = ("Education shall be directed to the full development of the human personality "
        "and to the strengthening of respect for human rights and fundamental freedoms.")

def search(target, text, context=6):
    # It's easier to use re.findall to split the string, 
    # as we get rid of the punctuation
    words = re.findall(r'\w+', text)

    matches = (i for (i,w) in enumerate(words) if w.lower() == target)
    for index in matches:
        if index < context //2:
            yield words[0:context+1]
        elif index > len(words) - context//2 - 1:
            yield words[-(context+1):]
        else:
            yield words[index - context//2:index + context//2 + 1]

print(list(search('the', text)))
# [['be', 'directed', 'to', 'the', 'full', 'development', 'of'], 
#  ['full', 'development', 'of', 'the', 'human', 'personality', 'and'], 
#  ['personality', 'and', 'to', 'the', 'strengthening', 'of', 'respect']]

print(list(search('shall', text)))
# [['Education', 'shall', 'be', 'directed', 'to', 'the', 'full']]

print(list(search('freedoms', text)))
# [['respect', 'for', 'human', 'rights', 'and', 'fundamental', 'freedoms']]

票数 2

Stack Overflow用户

发布于 2017-04-17 10:57:36

棘手的与潜在的一对一的错误，但我认为这符合您的规格。我已经删除了标点符号，也许最好在发送字符串进行分析之前删除它。我以为案子不重要。

test_str = "Education shall be directed to the full development of the human personality and to the strengthening of respect for human rights and fundamental freedoms."

def get_surrounding_words(search_word, s, n_words):
    words = s.lower().split(' ')
    try:
        i = words.index(search_word)
    except ValueError:
        return []
    # Word is near start
    if i < n_words/2:
        words.pop(i)
        return words[:n_words]
    # Word is near end
    elif i >= len(words) - n_words/2:
        words.pop(i)
        return words[-n_words:]
    # Word is in middle
    else:
        words.pop(i)
        return words[i-n_words/2:i+n_words/2]

def test(word):
    print('{}: {}'.format(word, get_surrounding_words(word, test_str, 6)))

test('notfound')
test('development')
test('shall')
test('education')
test('fundamental')
test('for')
test('freedoms')

票数 1

Stack Overflow用户

发布于 2017-04-17 11:02:13

import sys, os

args = sys.argv[1:]
if len(args) != 2:
   os.exit("Use with <string> <query>")
text = args[0]
query = args[1]
words = text.split()
op = []
left = 3
right = 3
try:
    index = words.index(query)
    if index <= left:
        start = 0
    else:
        start = index - left

    if start + left + right + 1 > len(words):
        start = len(words) - left - right - 1
        if start < 0:
            start = 0

    while len(op) < left + right and start < len(words):
        if start != index:
            op.append(words[start])
        start += 1
except ValueError:
    pass
print op

这是怎么工作的？
1. 在字符串中找到单词
2. 查看是否可以将索引中的left+right单词设置为
3. 取left+right字数并将它们保存在op中
4. 打印op

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43449773

复制

相似问题

问我想用Python2.7中的长字符串(段落)提取一定数量的围绕给定单词的单词
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我想用Python2.7中的长字符串(段落)提取一定数量的围绕给定单词的单词EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我想用Python2.7中的长字符串(段落)提取一定数量的围绕给定单词的单词
EN