文章/答案/技术大牛

发布

社区首页 >问答首页 >Knuth-Morris-Pratt字符串搜索算法

问Knuth-Morris-Pratt字符串搜索算法
EN

Code Review用户

提问于 2018-03-30 03:50:30

回答 1查看 543关注 0票数 3

我终于认为我已经正确地实现了Knuth-Morris-Pratt字符串搜索算法，现在是时候对那些让我学习的可爱的批评了！代码看起来并不那么漂亮，不过，据我所测试，它确实起作用了。我在这个算法上收集到的所有信息都来自于维基百科页面。我希望得到一些关于改进/错误的反馈和建议。我击中目标了吗？还是我没抓住重点？我是否应该为我的研究使用其他来源？

''' 
KMP Algorithm
**Searches for all occurrences of a word w[] in main string s[]

@author: Anonymous3.1415
'''

def KMP_filltable(pattern):

    position = 2
    candidate = 0
    length_pat = len(pattern)

    table = [0]*len(pattern)
    table[0] = -1

    while position < length_pat:
        if pattern[position - 1] == pattern[candidate]:
            position += 1
            candidate += 1
            table[position - 1] = candidate
        elif candidate > 0:
            candidate = table[candidate]
        else:
            table[position] = 0
            position += 1

    return table


def KMP_search(pattern, lst):

    positions = []
    i, j = 0, 0
    length_lst, length_pat  = len(lst), len(pattern)

    table = KMP_filltable(pattern)
    while i < length_lst:
        if pattern[j] == lst[i + j]:
            j += 1
            if j == length_pat:
                positions.append(i)
                i += j
                j = 0
        elif table[j] > 0:
            j = table[j]
            i += j
        else:
            i += (j + 1)
            j = 0

    return positions

python

algorithm

strings

回答 1

Code Review用户

回答已采纳

发布于 2018-03-30 11:35:00

此代码不起作用。在某些情况下，它找不到模式：

>>> KMP_search('ab', 'aab')
[]

在其他情况下，它引发了一个例外：

>>> KMP_search('ab', 'ba')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cr190837.py", line 39, in KMP_search
    if pattern[j] == lst[i + j]:
IndexError: string index out of range

你怎么会发现这些问题？通过对代码进行彻底的测试。一种方法是使用随机测试用例生成。随机测试背后的理念是，它可以测试比手工输入更大范围的输入，还可以测试系统测试用例生成无法达到的部分输入空间。

因此，随机测试用例的一个想法是生成长度为k的随机字符串：

haystack = ''.join(random.choices('abc', k=k))

(这里我使用的是Python3.6中的新版本random.choices。如果您使用早期版本的Python，可以使用类似于random.choice('abc') for _ in range(k)的循环。)

然后选择一个随机子字符串：

start, stop = sorted(random.sample(range(k + 1), 2))
needle = haystack[start:stop]

运行KMP_search(needle, haystack)并检查结果是否正确，例如通过与Python的内置str.find进行比较。

所有这些都可以使用单元测试模块打包到一个unittest中：

import random
import unittest

class TestKMPSearch(unittest.TestCase):
    "Unit tests for KMP_search."

    def test_positive(self):
        for k in range(1, 100):
            # Generate a random test case with k characters.
            haystack = ''.join(random.choices('abc', k=k))
            start, stop = sorted(random.sample(range(k + 1), 2))
            needle = haystack[start:stop]

            # Use Python's built-in str.find to compute the expected result.
            expected = []
            index = -1
            while True:
                index = haystack.find(needle, index + 1)
                if index == -1:
                    break
                expected.append(index)

            # Compare with the actual result.
            found = KMP_search(needle, haystack)
            self.assertEqual(found, expected,
                             "KMP_search({!r}, {!r})".format(needle, haystack))

(我称之为test_positive方法，因为测试用例是正向，因为针头是在干草堆中找到的。对于真正彻底的测试，您也应该生成一些负测试用例，在这些测试用例中找不到针头。但这足以发现一些错误。)

现在，如果您运行单元测试，您将(很可能)得到这样的失败：

$ python -munittest cr190837.py
F
======================================================================
FAIL: test_positive (cr190837.TestKMPSearch)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "cr190837.py", line 79, in test_positive
    "KMP_search({!r}, {!r})".format(needle, haystack))
AssertionError: Lists differ: [] != [3]

Second list contains 1 additional elements.
First extra element 0:
3

- []
+ [3]
?  +
 : KMP_search('bac', 'ccbbac')

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

票数 4

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/190837

复制

相似问题

问Knuth-Morris-Pratt字符串搜索算法
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Knuth-Morris-Pratt字符串搜索算法EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Knuth-Morris-Pratt字符串搜索算法
EN