我正在尝试使用二进制搜索来检查文件中单词的拼写,并打印出字典中没有的单词。但到目前为止,大多数拼写正确的单词都被打印为拼写错误的单词(在字典中找不到的单词)。字典文件也是一个文本文件,如下所示:
abactinally
abaction
abactor
abaculi
abaculus
abacus
abacuses
Abad
abada
Abadan
Abaddon
abaddon
abadejo
abadengo
abadia代码:
def binSearch(x, nums):
low = 0
high = len(nums)-1
while low <= high:
mid = (low + high)//2
item = nums[mid]
if x == item :
print(nums[mid])
return mid
elif x < item:
high = mid - 1
else:
low = mid + 1
return -1
def main():
print("This program performs a spell-check in a file")
print("and prints a report of the possibly misspelled words.\n")
# get the sequence of words from the file
fname = input("File to analyze: ")
text = open(fname,'r').read()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.split()
#import dictionary from file
fname2 =input("File of dictionary: ")
dic = open(fname2,'r').read()
dic = dic.split()
#perform binary search for misspelled words
misw = []
for w in words:
m = binSearch(w,dic)
if m == -1:
misw.append(w)发布于 2014-03-31 11:35:45
你的二进制搜索工作得很完美!不过,您似乎没有删除所有特殊字符。
测试你的代码(用我自己的句子):
def main():
print("This program performs a spell-check in a file")
print("and prints a report of the possibly misspelled words.\n")
text = 'An old mann gathreed his abacus, and ran a mile. His abacus\n ran two miles!'
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.lower().split(' ')
dic = ['a','abacus','an','and','arranged', 'gathered', 'his', 'man','mile','miles','old','ran','two']
#perform binary search for misspelled words
misw = []
for w in words:
m = binSearch(w,dic)
if m == -1:
misw.append(w)
print misw打印为输出['mann', 'gathreed', '', '', 'abacus\n', '']
这些额外的空字符串''是您用空格替换的额外的标点符号空格。\n (换行符)有一点问题,因为它确实是您在外部文本文件中看到的东西,但不是很直观的说明。你应该做的是,而不是for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_``{|}~':,只是检查是否每个字符.isalpha()尝试如下:
def main():
...
text = 'An old mann gathreed his abacus, and ran a mile. His abacus\n ran two miles!'
for ch in text:
if not ch.isalpha() and not ch == ' ':
#we want to keep spaces or else we'd only have one word in our entire text
text = text.replace(ch, '') #replace with empty string (basically, remove)
words = text.lower().split(' ')
#import dictionary
dic = ['a','abacus','an','and','arranged', 'gathered', 'his', 'man','mile','miles','old','ran','two']
#perform binary search for misspelled words
misw = []
for w in words:
m = binSearch(w,dic)
if m == -1:
misw.append(w)
print misw输出:
This program performs a spell-check in a file
and prints a report of the possibly misspelled words.
['mann', 'gathreed']希望这能对你有所帮助!如果您需要澄清或某些东西无法工作,请随时发表意见。
https://stackoverflow.com/questions/22751449
复制相似问题