我有一个名为neighbours_lookup的dataframe,它包含一个I列,还有一个包含标准化数据('vec')的列,存储为数组:
id vec
0 857827315 [-0.5345224838248487, -0.5345224838248487, 1.8...
1 857827311 [-0.3535533905932738, -0.3535533905932738, 2.8...
2 857827316 [-0.3535533905932738, -0.3535533905932738, -0....
3 857827312 [-0.5345224838248487, 1.8708286933869707, -0.5...
4 857827313 [-0.35355339059327373, -0.35355339059327373, -...我想要写一个函数,我可以输入一个ID,并得到10个最近的邻居回来。
我看过skikit.neighbours,我认为这是相关的--但是,我想不出如何使用它。我试过了
knn = NearestNeighbors(n_neighbors=10,
algorithm='auto')
for row in neighbours_lookup['vec']:
knn.fit(row.reshape(1, -1))我得到的错误是
AttributeError: 'list' object has no attribute 'reshape'有人能解释一下我该去哪儿吗?我的dataframe将有超过100,000行,所以我需要它是快速的。
--编辑--
多亏了达斯爸爸和我自己的捣乱,我让它运转起来了!职能如下。
def get_k_neighbours(isbn,df,number_of_neighbours):
def get_knn(df):
vector_arrays = df['vec'].to_numpy().tolist()
return NearestNeighbors().fit(vector_arrays)
def get_vector(df, isbn):
return df.loc[df['isbn'] == isbn, 'vec'].iloc[0].reshape(1, -1)
def flatten_neighbour_list(nb_indexes):
nb_list = nb_indexes.tolist()
return [item for sublist in nb_list for item in sublist]
knn = get_knn(df)
vector = get_vector(df, isbn)
nb_indexes = knn.kneighbors(vector,number_of_neighbours,return_distance=False)
nb_indexes = flatten_neighbour_list(nb_indexes)
return nb_indexes发布于 2021-07-08 18:46:16
Numpy ndarray有一个属性重塑,而不是列表,因此AttributeError。您可以将形状列表(n_samples、n_features)与NearestNeighbors相匹配。
from sklearn.neighbors import NearestNeighbors
knn = NearestNeighbors(n_neighbors=10, algorithm='auto')
knn.fit(neighbours_lookup['vec'].to_numpy())
def get_neighbors(id):
vector = neighbours_lookup.loc[id]
return knn.kneighbors([vector], 10, return_distance=False)https://stackoverflow.com/questions/68307012
复制相似问题