我正在尝试确定我的文档与其质心的欧几里德距离。有问题的两个数组(points和centers)的维数满足scipy.spatial.distance.cdist的XA和XB维数要求,但我不知道为什么会得到下面的ValueError。
我的代码:
import pandas as pd, numpy as np
from scipy.spatial.distance import cdist
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
corpus = pd.Series(["bye bye brutal good bye apple banana orange", "bye bye hello apple banana", "corn wheat apple banana goodbye cookie brutal", "fruit cake banana apple bye sweet sweet"])
X = vectorizer.fit_transform(corpus)
model = Kmeans(n_clusters = 2)
model.fit(X)
centers = model.cluster_centroids_
cdist(X, centers)这是我得到的错误:
ValueError: setting an array element with a sequence.摘自scipy.spatial.distance.cdist的文档:
Parameters: XA: ndarray
An Ma by n array of Ma original observations in an n-dimensional space
XB: ndarray
An Mb by n array of Mb original observations in an n-dimensional space
...我的X和centers numpy数组肯定满足cdist的这些维数条件,对吧?我遗漏了什么?
发布于 2016-08-05 21:19:11
你需要做的只是一个小改动:
cdist(X.toarray(),centers)因为X是一个scipy.sparse.csr.csr_matrix类型的对象,所以它不会被scipy函数直接作为有效的输入。toarray()方法将其转换为有效的numpy数组
https://stackoverflow.com/questions/38789757
复制相似问题