首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >绘制聚类矩阵

绘制聚类矩阵
EN

Stack Overflow用户
提问于 2019-11-25 22:44:34
回答 2查看 268关注 0票数 2

我想使用以下pandas数据帧从scikit-learn的K-means绘制一个聚类矩阵:

代码语言:javascript
复制
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer() # toy dataset
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
df = data.iloc[:,4:8] #select subset
df.columns = ['smoothness', 'compactness', 'concavity', 'concave points'] 
df

+----+--------------+---------------+-------------+------------------+
|    |   smoothness |   compactness |   concavity |   concave points |
|----+--------------+---------------+-------------+------------------|
|  0 |      0.1184  |       0.2776  |      0.3001 |          0.1471  |
|  1 |      0.08474 |       0.07864 |      0.0869 |          0.07017 |
|  2 |      0.1096  |       0.1599  |      0.1974 |          0.1279  |
|  3 |      0.1425  |       0.2839  |      0.2414 |          0.1052  |
|  4 |      0.1003  |       0.1328  |      0.198  |          0.1043  |
+----+--------------+---------------+-------------+------------------+
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-11-25 23:10:26

您可以简化seaborn.pairplot的使用,并将Kmeans.label_作为hue参数传入。例如:

代码语言:javascript
复制
import seaborn as sns
from sklearn.cluster import KMeans

def kmeans_scatterplot(df, n_clusters):
    km = KMeans(init='k-means++', n_clusters=n_clusters)
    km_clustering = km.fit(df)
    sns.pairplot(df.assign(hue=km_clustering.labels_), hue='hue')

kmeans_scatterplot(df, 2)

输出

票数 2
EN

Stack Overflow用户

发布于 2019-11-25 22:46:21

您可以使用以下命令来执行此操作:

代码语言:javascript
复制
def kmeans_scatterplot(df, n_clusters):
    axs_length = len(df.columns) 
    fig, axs = plt.subplots(axs_length, axs_length, figsize=(20,20))

    for i, column_i in enumerate(df):
        for j, column_j in enumerate(df):

            # create plot
            if column_i != column_j:
                df_temp = df[[column_i, column_j]]
                km = KMeans(init='k-means++', n_clusters=n_clusters)
                km_clustering = km.fit(df_temp)
                axs[i][j].scatter(df_temp[column_i], df_temp[column_j], c=km_clustering.labels_, cmap='rainbow', alpha=0.7, edgecolors='b')

            # only show left and bottom lables
            if i == axs_length - 1:
                axs[i][j].set_xlabel(column_j)
            if j == 0:
                axs[i][j].set_ylabel(column_i)

kmeans_scatterplot(df, 2)

结果:

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59034334

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档