文章/答案/技术大牛

发布

社区首页 >问答首页 >在没有重叠的字符串中查找不同单词的位置

问在没有重叠的字符串中查找不同单词的位置
EN

Stack Overflow用户

提问于 2017-03-05 20:07:24

回答 1查看 180关注 0票数 1

我试图在文本字符串中搜索特定的单词/字符串，并将它们的位置放到字典中。

一个例子可以更好地解释我想要完成什么，以及我的问题是什么。

content = """Learning python is something I always wanted to do. The fact that python is a simple and intuitive language made me feel bad for learning other programming languages in the first place. I think the main reason why I didn't choose the python language was the fact that I didn't do a proper research about the pros and cons of the available programming options. I gues that writing this paragraph about learning the python language it's harder than the python script I'm trying to accomplish. No, I'm just kidding, if this was the case then I would have completed writing the python languaguage and didn't bother you guys anymore."""

mylist = ['python', 'dummy keyword', 'python language', 'learning the python language', 'another keyword']

dictKw = {}
for x in mylist:
    x = x.lower()
    listKw = []
    for m in re.finditer(x, contentLower):
        #print (x  , " found " , m.start(), m.end())
        listKwPos = []
        listKwPos = [m.start(), m.end()]
        listKw.append(listKwPos)
        dictKw [x] = listKw

print dictKw

因此，我在content字符串中搜索mylist中的每个关键字，并将每个事件的开始和结束位置存储在一个字典中，字典中包含关键字作为关键字，以及关键字位置的列表列表。

打印我得到的dictKw：

{'python': [[9, 15], [66, 72], [234, 240], [414, 420], [451, 457], [574, 580]], 'learning the python language': [[401, 429]], 'python language': [[234, 249], [414, 429]]}

首先，我认为字典中的键顺序是错误的-- python，学习python语言，python语言，而不是 python，python语言，学习语言。我看到，在追加listKw列表时，它将学习python语言键的放在python和python语言之间，而不是放在最后。

我认为正确的结果应该是：

{'python': [[9, 15], [66, 72], [234, 240], [414, 420], [451, 457], [574, 580]], 'python language': [[234, 249], [414, 429]], 'learning the python language': [[401, 429]]}

现在，我想删除关键字相互重叠的列表元素，保持mylist中第一个关键字的初始优先级。

在我们的示例中，python是重叠的python语言，因此第一次发生这种情况时，python语言应该丢失第一个位置列表，因此结果是：

{'python': [[9, 15], [66, 72], [234, 240], [414, 420], [451, 457], [574, 580]], 'python language': [[414, 429]],'learning the python language': [[401, 429]]}

在检查剩余的重叠时，优先级应该改变，这样python就会丢失重叠列表元素，因此结果是：

{'python': [[9, 15], [66, 72], [234, 240], [451, 457], [574, 580]], 'python language': [[414, 429]],'learning the python language': [[401, 429]]}

诸若此类。因此，如果我们遇到第三个重叠，那么优先级应该再次切换到python，这样python 就会丢失开始/结束元素列表。

在完成此检查之后，python、和学习之后，应遵循重叠检查，从而删除学习语言字典键的列表值。

最后的结果应该是：

{'python': [[9, 15], [66, 72], [234, 240], [451, 457], [574, 580]], 'python language': [[414, 429]],'learning the python language': [[]]}

现在，对于这个重叠的问题，我不知道从哪里开始，所以我请求你帮助我走向正确的方向，或者为我正在努力完成的事情提供另一种方法。

请记住，mylist元素可以有任何其他顺序，并且元素的顺序决定关键字优先级-顶部元素具有最高优先级。

dictionary

python

python-2.7

list

回答 1

Stack Overflow用户

发布于 2017-03-05 20:23:50

注意，在python中，字典{"a": 1; "b": 2; "c": 3}和{"b":2 ; "a" : 1; "c": 3}是等价的--默认情况下，键是完全无序的。要解决这个问题，您可以使用一个OrderedDict，它将通过添加顺序键/值对对字典的元素进行排序。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/42613349

复制

相似问题

问在没有重叠的字符串中查找不同单词的位置
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在没有重叠的字符串中查找不同单词的位置EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在没有重叠的字符串中查找不同单词的位置
EN