如果我们将K-means和顺序K-means方法应用于具有相同初始设置的相同数据集,我们是否会获得相同的结果?解释你的理由。
我个人认为答案是否定的。序列K-means得到的结果取决于数据点的呈现顺序。并且结束条件也不相同。
这里附加了两个聚类算法的伪代码。
K-均值
Make initial guesses for the means m1, m2, ..., mk
Until there is no change in any mean
Assign each data point to the cluster whose mean is the nearest.
Calculate the mean of each cluster.
For i from 1 to k
Replace mi with the mean of all examples for cluster i.
end_for
end_until序列K-均值算法
Make initial guesses for the means m1, m2, ..., mk
Set the counts n1, n2, ..., nk to zero
Until interrupted
Acquire the next example, x
If mi is closest to x
Increment ni
Replace mi by mi + (1/ni)*(x - mi)
end_if
end_until发布于 2011-12-02 11:33:10
正确,结果可能会不同。
点数: x1 = (0,0),x2 = (1,1),x3 = (0.75,0),x4 = (0.25,1);m1 = (0,0.5),m2 = (1,0.5)。K-means将x1和x4分配给m1群集,将x2和x3分配给m2群集。新的均值是m1‘= (0.125,0.5)和m2’= (0.875,0.5),并且不发生重新分配。对于顺序K均值,在分配x1之后,m1移动到(0,0),x2移动m2到(1,1)。那么m1是最接近x3的均值,所以m1移动到(0.375,0)。最后,m2最接近x4,因此m2移动到(0.625,1)。这也是一个稳定的配置。
https://stackoverflow.com/questions/8351296
复制相似问题