文章/答案/技术大牛

发布

社区首页 >问答首页 >嵌套的for循环元素列表比较

问嵌套的for循环元素列表比较
EN

Stack Overflow用户

提问于 2016-02-20 05:35:13

回答 3查看 506关注 0票数 0

作为一种解决我所描述的here挑战的新方法，我总结了以下几点：

from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

diffs =[
"""- It contains a Title II provision that changes the age at which workers
compensation/public disability offset ends for disability beneficiaries from age 65 to full retirement age (FRA).""",
"""+ It contains a Title II provision that changes the age at which workers 
compensation/public disability offset ends for disability beneficiaries from age 68 to full retirement age (FRA).""",
"""+ Here's a new paragraph I added for testing."""]

for s in diffs:
    others = [i for i in diffs if i != s]
    for j in others:
        if similar(s, j) > 0.7:
            print '"{}" and "{}" refer to the same sentence'.format(s, j)
            print
            diffs.remove(j)
        else:
            print '"{}" is a new sentence'.format(s)

其思想是遍历字符串，并将每个字符串与其他字符串进行比较。如果一个给定的字符串被认为与另一个字符串相似，则删除另一个字符串，否则该给定字符串将被视为列表中的唯一字符串。

下面是输出：

"- It contains a Title II provision that changes the age at which workers
compensation/public disability offset ends for disability beneficiaries from age 65 to full retirement age (FRA)." and "+ It contains a Title II provision that changes the age at which workers 
compensation/public disability offset ends for disability beneficiaries from age 68 to full retirement age (FRA)." refer to the same sentence


"- It contains a Title II provision that changes the age at which workers
compensation/public disability offset ends for disability beneficiaries from age 65 to full retirement age (FRA)." is a new sentence
"+ Here's a new paragraph I added for testing." is a new sentence

因此，它正确地检测到前两个句子相似，而最后一个句子是唯一的。问题是，它会返回并认为第一个句子是唯一的(事实并非如此，而且无论如何它都不应该返回到这个句子)。

我的循环逻辑中的缺陷在哪里？在没有嵌套的for和移除元素的情况下，这能实现吗？

python

list

for-loop

iteration

nested-loops

回答 3

Stack Overflow用户

发布于 2016-02-20 05:44:44

from difflib import SequenceMatcher
from collections import defaultdict

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

diffs =[
"""- It contains a Title II provision that changes the age at which workers
compensation/public disability offset ends for disability beneficiaries from age 65 to full retirement age (FRA).""",
"""+ It contains a Title II provision that changes the age at which workers 
compensation/public disability offset ends for disability beneficiaries from age 68 to full retirement age (FRA).""",
"""+ Here's a new paragraph I added for testing."""]


sims = set()
simdict = defaultdict(list)
for i in range(len(diffs)):
    if i in sims:
        continue
    s = diffs[i]

    for j in range(i+1, len(diffs)):
        r = diffs[j]
        if similar(s, r) > 0.7:
            sims.add(j)
            simdict[i].append(j)


for k, v in simdict.iteritems():
    print diffs[k] + " is similar to:"
    print '\n'.join(diffs[e] for e in v)

票数 1

Stack Overflow用户

发布于 2016-02-20 05:59:22

当它确定第一个句子是唯一的时，您可以通过更改

print '"{}" is a new sentence'.format(s)

至

print '"{}" and "{}" are different sentences'.format(s,j)

这应该可以帮助您查看循环失败的确切位置。

票数 0

Stack Overflow用户

发布于 2016-02-20 12:22:13

由于修改后的字符串总是背靠背显示(一个字符串前面带有'-'，另一个字符串前面带有'+'和'-'，因此可以执行以下操作(我相信它在所有情况下都有效)。

当列表中有奇数个元素时，最后一个元素必须是一个新句子。

def extract_modified_and_new(diffs):
    for z1, z2 in zip(diffs[::2], diffs[1::2]):
        if similar(z1, z2) > 0.7:
            print z1, 'is similar to', z2
            print
        else:
            print z1, ' and ', z2, 'are new'
            print
    if len(diffs) % 2 != 0:
            print diffs[-1], ' is new'

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/35515482

复制

相似问题

问嵌套的for循环元素列表比较
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问嵌套的for循环元素列表比较EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问嵌套的for循环元素列表比较
EN