我用的是DBSCAN algo。论“皮马印度糖尿病”

对数据进行聚类。另外,我还希望在每个聚类中使用分类算法,比较每个聚类的准确性,并对大多数聚类进行预测。请帮助降低eps的价值,它给出了这个观点。

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv('diabetes.csv')
data = pd.DataFrame(data)
data = StandardScaler().fit_transform(data)
clustering = DBSCAN(eps=0.01, min_samples=10).fit(data)
clusters = len(set(clustering.labels_))
print("number of clusters : ", clusters)
def show_clusters(data, clusters):
df = pd.DataFrame(dict(x=data[:,0], y=data[:,1],
label=clusters))
colors = {-1:'black', 0:'blue', 1:'skyblue', 2:'orange',
3:'yellow', 4:'pink', 5:'red'}
fig, ax = plt.subplots(figsize=(6,6))
grouped = df.groupby('label')
for key, group in grouped:
group.plot(ax=ax, kind='scatter', x='x', y='y', label=key,
color=colors[key])
plt.xlabel('x')
plt.ylabel('y')
plt.show()Show_clusters(数据、集群)
发布于 2022-11-16 06:19:14
需要作出两项修改:

如上图所示,EPS不能太低或太高。
https://datascience.stackexchange.com/questions/115951
复制相似问题