import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
a = np.array([[3,4],[2,5],[1,2],[1,2],[4,5]])
ap = pd.DataFrame(a, index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'],columns=['search_history','view_count'])
ap

b = np.array([[4,4],[3,5],[2,1],[4,7],[1,2]])
bp = pd.DataFrame(b, index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'],columns=['comment + wishlist ',' signup'])
bp

然后我cosine_similarity函数,
from sklearn.metrics.pairwise import cosine_similarity
pd.DataFrame(cosine_similarity(a, b),columns=['A','B'], index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'])这意味着:
ValueError: Shape of passed values is (5, 5), indices imply (5, 2)所以如果我像这样改变,
from sklearn.metrics.pairwise import cosine_similarity
pd.DataFrame(cosine_similarity(a, b),columns=['A','B','c','d','e'], index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'])

这个结果被淘汰了。
这不是我想的结果。与dataFrames a和b一样,我希望在5行和2列中显示结果,但我们总是在5行和5列中获得结果。
我该怎么办?
预期结果是
A B
Sonata 0.989949 0.994692
Etudes 0.919145 0.987241
Waltzes 0.948683 0.997054
Nocturnes 0.948683 0.997054
Marches 0.993884 0.990992 像这样
发布于 2022-05-02 10:20:31
cosine_similarity()将比较数组中的每个值和第二个数组中的所有值,即5 * 5操作和结果。您只需要前两列,这样就可以将结果DataFrame切片
df = pd.DataFrame(cosine_similarity(a, b), columns=['A', 'B', 'C', 'D', 'E'], index=['Sonata', 'Etudes', 'Waltzes', 'Nocturnes', 'Marches'])
print(df[['A', 'B']]) # by columns names
# or
print(df.iloc[:, 0:2]) # by columns indices输出
A B
Sonata 0.989949 0.994692
Etudes 0.919145 0.987241
Waltzes 0.948683 0.997054
Nocturnes 0.948683 0.997054
Marches 0.993884 0.990992https://stackoverflow.com/questions/72084283
复制相似问题