首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >计算MemorError时silhouette_score

计算MemorError时silhouette_score
EN

Stack Overflow用户
提问于 2018-07-03 08:08:47
回答 1查看 897关注 0票数 1

我正在形状矩阵(190868,35)上运行KMeans聚类算法。我正在为相同的代码运行以下代码:

代码语言:javascript
复制
for n_clusters in range(3,10):
kmeans = KMeans(init='k-means++',n_clusters=n_clusters,n_init=30)
kmeans.fit(matrix)
clusters = kmeans.predict(matrix)
silhouette_avg=silhouette_score(matrix,clusters)
print("For n_clusters =",n_clusters,"The avg silhouette_score is :",silhouette_avg)

我有以下错误

代码语言:javascript
复制
Traceback (most recent call last):

  File "<ipython-input-6-be918e90030a>", line 5, in <module>
    silhouette_avg=silhouette_score(matrix,clusters)

  File "C:\Users\arindam\Anaconda3\lib\site-packages\sklearn\metrics\cluster\unsupervised.py", line 101, in silhouette_score
    return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))

  File "C:\Users\arindam\Anaconda3\lib\site-packages\sklearn\metrics\cluster\unsupervised.py", line 169, in silhouette_samples
    distances = pairwise_distances(X, metric=metric, **kwds)

  File "C:\Users\arindam\Anaconda3\lib\site-packages\sklearn\metrics\pairwise.py", line 1247, in pairwise_distances
    return _parallel_pairwise(X, Y, func, n_jobs, **kwds)

  File "C:\Users\arindam\Anaconda3\lib\site-packages\sklearn\metrics\pairwise.py", line 1090, in _parallel_pairwise
    return func(X, Y, **kwds)

  File "C:\Users\arindam\Anaconda3\lib\site-packages\sklearn\metrics\pairwise.py", line 246, in euclidean_distances
    distances = safe_sparse_dot(X, Y.T, dense_output=True)

  File "C:\Users\arindam\Anaconda3\lib\site-packages\sklearn\utils\extmath.py", line 140, in safe_sparse_dot
    return np.dot(a, b)

MemoryError

如果有人知道这方面的解决方案,请提出建议。我已经尝试指定sample_size = 70000,代码运行并消耗所有内存,系统冻结。我有一个拥有16 am内存和i7处理器的联想Thinkpad。

EN

回答 1

Stack Overflow用户

发布于 2018-07-26 07:20:00

MemoryError意味着内存不足以在执行silhouette_score时分配numpy数组。因此,解决方案是减少内存或增加内存空间:

解决方案1.通过将大小设置为silhouette_score来分配较少的内存空间

参考资料:https://stackoverflow.com/a/16425008/1229868

如何找到最合适的sample_size

代码语言:javascript
复制
def eval_silhouette_score(matrix, clusters, sample_size):
    try:
        silhouette_avg = metrics.silhouette_score(matrix, clusters, sample_size = sample_size)
        return silhouette_avg
    except MemoryError:
        return None

div_factor = 1.
silhouette_avg = None
while silhouette_avg == None:
    sample_size = int(len(clusters) / div_factor)
    silhouette_avg = eval_silhouette_score(matrix, clusters, sample_size)
    div_factor += 1.

解决方案2.安装更多物理内存:)

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51149589

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档