如果我有两个数据帧(John,Alex,harry)和(ryan,kane,king)。如何在python中使用fuzzywuzzy来获得以下输出。
fuzz.Ratio
John ryan 25
John kane 54
John king 44
alex ryan 23
alex kane 14
alex king 55
harry ryan 47
harry kane 47
harry king 50发布于 2020-12-15 15:54:06
你的比例是错误的。您正在寻找的是两个数据帧的相应列的笛卡尔乘积。
示例代码:
import itertools
df1 = pd.DataFrame({'name': ['John','Alex','harry']})
df2 = pd.DataFrame({'name': ['ryan','kane','king']})
for w1, w2 in itertools.product(
df1['name'].apply(str.lower).values, df2['name'].apply(str.lower).values):
print (f"{w1}, {w2}, {fuzz.ratio(w1,w2)}")输出:
john, ryan, 25
john, kane, 25
john, king, 25
alex, ryan, 25
alex, kane, 50
alex, king, 0
harry, ryan, 44
harry, kane, 22
harry, king, 0发布于 2020-12-15 15:56:37
IIUC,你可以这样做:
from fuzzywuzzy import fuzz
from itertools import product
import pandas as pd
a = ('John','Alex','harry')
b = ('ryan', 'kane', 'king')
# compute the ratios for each pair
res = ((ai, bi, fuzz.ratio(ai, bi)) for ai, bi in product(a, b))
# create DataFrame filter out the values that are 0
out = pd.DataFrame([e for e in res if e[2] > 0], columns=['name_a', 'name_b', 'fuzz_ratio'])
print(out)输出
name_a name_b fuzz_ratio
0 John ryan 25
1 John kane 25
2 John king 25
3 Alex kane 25
4 harry ryan 44
5 harry kane 22https://stackoverflow.com/questions/65301766
复制相似问题