假设我有以下源字符串:
Humpty dumpty <span id="1">sat</span> on a wall, humpty dumpty had a great fall. All of <span id="two">the kings</span> horses and all the kings men.在列表中还有其他几个字符串,每个字符串由一个新行分隔:
Humpty dumpty sat on a wall, humpty dumpty had a great fall. All of the kings horses and all the kings men.
Humpty dumpty sat on the wall, all of the kings horses and all the kings men.
There is a humpty dumpty who had sat on the wall, and all of the kings horses and all the kings men.
Humpty dumpty sat on some wall, humpty dumpty had a great fall. All of the kings horses and all the kings men couldn't put him together again.
Humpty dumpty this is a completely related sentence.我希望能够从目标字符串开始,找出使用python与源字符串最接近的“列表中的其他字符串”中的哪一个。在源字符串和目标字符串对之间的比较中,有什么最佳的方法可以得出一些“分数”,并根据某些标准确定哪个字符串与源字符串最接近吗?(在这种情况下,最类似的字符串应该是第一个字符串,因为它是没有任何"<span id="1"></span>“的源字符串。
发布于 2013-09-10 05:05:45
您可以使用PyLevenshtein模块来查找Levenshtein距离,并使用它来确定字符串之间的相似性。
https://stackoverflow.com/questions/18710942
复制相似问题