首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >python3,衍射图SequenceMatcher

python3,衍射图SequenceMatcher
EN

Stack Overflow用户
提问于 2018-02-19 03:03:23
回答 1查看 843关注 0票数 0

下面是两个字符串,比较它们之间的差异,并将它们作为同义词以及它们的差异返回,用空格分隔(保持最长的刺的长度)。

代码中的注释区域是应该返回的4个字符串。

代码语言:javascript
复制
from difflib import SequenceMatcher




t1 = 'betty:  backstreetvboysareback"give.jpg"LAlarrygarryhannyhref="ang"_self'

t2 = 'bettyv:  backstreetvboysareback"lifeislike"LAlarrygarryhannyhref="in.php"_self'


#t1 = 'betty :  backstreetvboysareback" i e      "LAlarrygarryhannyhref=" n    "_self'
#t2 = 'betty :  backstreetvboysareback" i e      "LAlarrygarryhannyhref=" n    "_self'

#o1 = '                                g v .jpg                          g           '
#o2 = '     v                          l f islike                        i .php      '



matcher = SequenceMatcher(None, t1, t2)
blocks = matcher.get_matching_blocks()

bla1 = []
bla2 = []

for i in range(len(blocks)):
    if i != len(blocks)-1:
        bla1.append([t1[blocks[i].a + blocks[i].size:blocks[i+1].a], blocks[i].a + blocks[i].size, blocks[i+1].a])
        bla2.append([t2[blocks[i].b + blocks[i].size:blocks[i+1].b], blocks[i].b + blocks[i].size, blocks[i+1].b])



cnt = 0
for i in range(len(bla1)):


    if bla1[i][1] < bla2[i][1]:
        num = bla2[i][1] - bla1[i][1]
        t2 = t2[0:bla2[i][1]] + ' '*num + t2[bla2[i][1]:len(t2)]
        bla2[i][0] = ' '*num + bla2[i][0]
        bla2[i][1] = bla1[i][1]

    if bla2[i][1] < bla1[i][1]:
        num = bla1[i][1] - bla2[i][1]
        t1 = t1[0:bla1[i][1]] + ' '*num + t1[bla1[i][1]:len(t1)]
        bla1[i][0] = ' '*num + bla1[i][0]
        bla1[i][1] = bla2[i][1]

    if bla1[i][2] > bla2[i][2]:
        num = bla1[i][2] - bla2[i][2]
        t2 = t2[0:bla2[i][2]] + ' '*num + t2[bla2[i][2]:len(t2)]
        bla2[i][0] = bla2[i][0] + ' '*num
        bla2[i][2] = bla1[i][2]

    if bla2[i][2] > bla1[i][2]:
        num = bla2[i][2] - bla1[i][2]
        t1 = t1[0:bla1[i][2]] + ' '*num + t1[bla1[i][2]:len(t1)]
        bla1[i][0] = bla1[i][0] + ' '*num
        bla1[i][2] = bla2[i][2]




t11 = []
t11 = t1[0:bla1[0][1]]
t11 += t1[bla1[0][2]:bla1[1][1]]
t11 += t1[bla1[1][2]:bla1[2][1]]
t11 += t1[bla1[2][2]:bla1[3][1]]
t11 += t1[bla1[3][2]:bla1[4][1]]
t11 += t1[bla1[5][2]:bla1[6][1]]
t11 += t1[bla1[6][2]:len(t1)]

t12 = []
t12 = t2[0:bla1[0][1]]
t12 += t2[bla1[0][2]:bla1[1][1]]
t12 += t2[bla1[1][2]:bla1[2][1]]
t12 += t2[bla1[2][2]:bla1[3][1]]
t12 += t2[bla1[3][2]:bla1[4][1]]
t12 += t2[bla1[5][2]:bla1[6][1]]
t12 += t2[bla1[6][2]:len(t2)]

将块排列成一个有组织的格式bla1bla2,其中每个差异都存储为一个字符串,其起始位置和结束位置(例如,每个单独字符串的['v', 33, 34] )。在此之后,我尝试插入空格以匹配所需的长度和分离因子,这就是代码开始中断的地方。

如果有人能看一看就好了!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-02-25 18:38:43

我一直致力于解决这个问题,而且由于没有人发布响应,我将发布progress解决方案。以下代码是progress ..。它很好地处理变化,有较少的偏移,但开始打破进入更大的差异,特别是在保持间距(偏移),以匹配两者。

代码语言:javascript
复制
from difflib import SequenceMatcher
import pdb


t1 = 'betty:  backstreetvboysareback"give.jpg"LAlarrygarryhannyhref="ang"_self'

t2 = 'betty:  backstreetvboysareback"lol.jpg"LAlarrygarryhannyhref="ang"_self'

#t2 = 'bettyv:  backstreetvboysareback"lifeislike"LAlarrygarryhannyhref="in.php"_selff'

#t2 = 'LA'
#t2 = 'c give.'
#t2 = 'give.'




#t1 = 'betty :  backstreetvboysareback" i e      "LAlarrygarryhannyhref=" n    "_self'
#t2 = 'betty :  backstreetvboysareback" i e      "LAlarrygarryhannyhref=" n    "_self'

#o1 = '                                g v .jpg                          g           '
#o2 = '     v                          l f islike                        i .php      '



matcher = SequenceMatcher(None, t1, t2)
blocks = matcher.get_matching_blocks()

#print(len(blocks))

bla1 = []
bla2 = []

#bla = (string), (first pos), (second pos), (pos1 + pos2), (pos + pos2 total positions added togeather)
dnt = False
for i in range(len(blocks)):

    if i == 0:
      if blocks[i].a != 0 and dnt == False:
        bla1.append([t1[blocks[i].a:blocks[i].b], 0, blocks[i].a, 0, 0])
        bla2.append([t2[blocks[i].a:blocks[i].b], 0, blocks[i].b, 0, 0])
        dnt = True

      if blocks[i].b != 0 and dnt == False:
        bla2.append([t2[blocks[i].a:blocks[i].b], 0, blocks[i].b, 0, 0])
        bla1.append([t1[blocks[i].a:blocks[i].b], 0, blocks[i].a, 0, 0])
        dnt = True

    if i != len(blocks)-1:
        print(blocks[i])

        bla1.append([t1[blocks[i].a + blocks[i].size:blocks[i+1].a], blocks[i].a + blocks[i].size, blocks[i+1].a, 0, 0])
        bla2.append([t2[blocks[i].b + blocks[i].size:blocks[i+1].b], blocks[i].b + blocks[i].size, blocks[i+1].b, 0, 0])

#pdb.set_trace()

ttl = 0
for i in range(len(bla1)):
  cnt = bla1[i][2] - bla1[i][1]
  if cnt != 0:
    bla1[i][3] = cnt
  ttl = ttl + cnt
  bla1[i][4] = ttl

ttl = 0
for i in range(len(bla2)):
  cnt = bla2[i][2] - bla2[i][1]
  if cnt != 0:
    bla2[i][3] = cnt
  ttl = ttl + cnt
  bla2[i][4] = ttl

print(bla1)
print(bla2)

tt1 = ''
dif = 0
i = 0
while True:

  if i == 0:
    if bla1[i][3] >= bla2[i][3]: dif = bla1[i][3]
    if bla1[i][3] < bla2[i][3]: dif = bla2[i][3]  
    tt1 += t1[:bla1[i][1]] + '_'*dif

  if i <= len(bla1) -1:

    if bla1[i][3] >= bla2[i][3]: dif = bla1[i][3]
    if bla1[i][3] < bla2[i][3]: dif = bla2[i][3]

    if len(bla1) != 1:
      if i == 0: tt1 += t1[bla1[i][1] + bla1[i][3]:bla1[i+1][1]]
      if i != 0 and i != len(bla1)-1: tt1 += '_'*dif + t1[bla1[i][1] + bla1[i][3]:bla1[i+1][1]]
      if i == len(bla1)-1: tt1 += '_'*dif + t1[bla1[i][1] + bla1[i][3]:len(t1)]

    i = i+1
    print('t1 = ' + tt1)

  else:
    break

tt2 = ''
i = 0
dif = 0
while True:

  if i == 0:

    if bla1[i][3] >= bla2[i][3]: dif = bla1[i][3]
    if bla1[i][3] < bla2[i][3]: dif = bla2[i][3]   
    tt2 += t2[:bla2[i][1]] + '_'*dif

  if i <= len(bla2) -1:

    if bla1[i][3] >= bla2[i][3]: dif = bla1[i][3]
    if bla1[i][3] < bla2[i][3]: dif = bla2[i][3]    

    if len(bla2) != 1:
      if i == 0: tt2 += t2[bla2[i][1] + bla2[i][3]:bla2[i+1][1]]
      if i != 0 and i != len(bla1)-1: tt2 += '_'*dif + t2[bla2[i][1] + bla2[i][3]:bla2[i+1][1]]
      if i == len(bla2)-1: tt2 += '_'*dif + t2[bla2[i][1] + bla2[i][3]:len(t2)]

    i = i+1
    print('t2 = ' + tt2)

  else:
    break

  print()

解决方案:

不幸的是,我太忙了,无法继续编写这些代码,并求助于子处理巴氏 .这是一个非常好的选择,许多艰苦的编码!

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/48859026

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档