问使用二进制搜索进行拼写检查
EN

Stack Overflow用户

提问于 2014-03-31 07:49:04

回答 1查看 721关注 0票数 3

我正在尝试使用二进制搜索来检查文件中单词的拼写，并打印出字典中没有的单词。但到目前为止，大多数拼写正确的单词都被打印为拼写错误的单词(在字典中找不到的单词)。字典文件也是一个文本文件，如下所示：

abactinally
abaction
abactor
abaculi
abaculus
abacus
abacuses
Abad
abada
Abadan
Abaddon
abaddon
abadejo
abadengo
abadia

代码：

def binSearch(x, nums):
    low = 0
    high = len(nums)-1
    while low <= high:          
        mid = (low + high)//2   
        item = nums[mid]
        if x == item :
            print(nums[mid])
            return mid
        elif x < item:         
            high = mid - 1      
        else:                  
            low = mid + 1       
    return -1                  



def main():

    print("This program performs a spell-check in a file")
    print("and prints a report of the possibly misspelled words.\n")

    # get the sequence of words from the file
    fname = input("File to analyze: ")
    text = open(fname,'r').read()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
        text = text.replace(ch, ' ')
    words = text.split()

    #import dictionary from file
    fname2 =input("File of dictionary: ")
    dic = open(fname2,'r').read()
    dic = dic.split()

    #perform binary search for misspelled words
    misw = []
    for w in words:
        m = binSearch(w,dic)
        if m == -1:
            misw.append(w)

python

回答 1

Stack Overflow用户

发布于 2014-03-31 11:35:45

你的二进制搜索工作得很完美！不过，您似乎没有删除所有特殊字符。

测试你的代码(用我自己的句子)：

def main():

   print("This program performs a spell-check in a file")
   print("and prints a report of the possibly misspelled words.\n")

   text = 'An old mann gathreed his abacus, and ran a mile.  His abacus\n ran two miles!'
   for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
       text = text.replace(ch, ' ')
   words = text.lower().split(' ')

   dic = ['a','abacus','an','and','arranged', 'gathered', 'his', 'man','mile','miles','old','ran','two']

   #perform binary search for misspelled words
   misw = []
   for w in words:
       m = binSearch(w,dic)
       if m == -1:
           misw.append(w)
   print misw

打印为输出['mann', 'gathreed', '', '', 'abacus\n', '']

这些额外的空字符串''是您用空格替换的额外的标点符号空格。\n (换行符)有一点问题，因为它确实是您在外部文本文件中看到的东西，但不是很直观的说明。你应该做的是，而不是for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_``{|}~':，只是检查是否每个字符.isalpha()尝试如下：

def main():

   ...

   text = 'An old mann gathreed his abacus, and ran a mile. His abacus\n ran two miles!'
   for ch in text:
       if not ch.isalpha() and not ch == ' ': 
           #we want to keep spaces or else we'd only have one word in our entire text
           text = text.replace(ch, '') #replace with empty string (basically, remove)
   words = text.lower().split(' ')

   #import dictionary
   dic = ['a','abacus','an','and','arranged', 'gathered', 'his', 'man','mile','miles','old','ran','two']

   #perform binary search for misspelled words
   misw = []
   for w in words:
       m = binSearch(w,dic)
       if m == -1:
           misw.append(w)
   print misw

输出：

This program performs a spell-check in a file
and prints a report of the possibly misspelled words.

['mann', 'gathreed']

希望这能对你有所帮助！如果您需要澄清或某些东西无法工作，请随时发表意见。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/22751449

复制

相似问题

问使用二进制搜索进行拼写检查
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用二进制搜索进行拼写检查EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用二进制搜索进行拼写检查
EN