我正在尝试使用FuzzyWuzzy来纠正文本中拼写错误的名字。然而,我无法让process.extract和process.extractOne按照我所期望的方式行事。
from fuzzywuzzy import process
the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'
the_text = the_text.split()
found_word = process.extract(search_term, the_text)
print(found_word)这导致:
[('e', 90), ('VEIGA', 80), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]如何让FuzzyWuzzy正确地识别“VEIGA”作为正确的响应?
发布于 2018-05-22 13:13:31
您可以尝试使用: fuzz.token_set_ratio或fuzz.token_sort_ratio,这里的答案是:When to use which fuzz function to compare 2 strings给出了一个很好的解释。
为了完成,这里有一些代码:
from fuzzywuzzy import process
from fuzzywuzzy import fuzz
the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'
the_text = the_text.split()
found_word = process.extract(search_term, the_text, scorer=fuzz.token_sort_ratio)
print(found_word)产出:
(VEIGA,80),(e,33),(HUGO,22),(VICTOR,18),(MARIANA,17)
https://stackoverflow.com/questions/50468250
复制相似问题