我已经构建了一个DBSCAN集群模型。输出结果与使用泡菜文件后的结果不匹配。
基于HD和MC列,我对WT列进行了聚类。
data = HD,MC
Target = WT下面,对于第一次记录,集群为0。
但是在从'pkl‘文件中运行它之后,它将显示预测的结果为-1。
Dataframe:
HD MC WT Cluster
200 Other 4.5 0
150 Pep 5.6 0
100 Pla 35 -1
50 Same 15 0代码:
le = preprocessing.LabelEncoder()
df['MC encoded'] = le.fit_transform(df['MC'])
col_1 = ['HD','MC encoded']
data = df[col_1]
col_2 = ['WT']
target = df[col_2]
data = data.fillna(value=0)
model = DBSCAN(eps=1, min_samples=20).fit(data)
outliers_df = pd.DataFrame(data)
print(Counter(model.labels_))
x = model.fit_predict(target)
print(Counter(x))结果:
Counter({-1: 604, 0: 142, 1: 83, 9: 36, 2: 27, 7: 26, 10: 26, 8: 24, 4: 23, 5: 23, 3: 22, 11: 21, 6: 20, 12: 20, 13: 20})
Counter({0: 1093, -1: 24})代码:
df["Cluster"] = x
filename1 = '/model.pkl'
model_df = open(filename1, 'wb')
pickle.dump(model,model_df)
model_df.close()
output = open('/MC.pkl', 'wb')
pickle.dump(le, output)
output.close()
with open('model.pkl', 'rb') as file:
pickle_model = pickle.load(file)
pkl_file = open('MC.pkl', 'rb')
le_mc = pickle.load(pkl_file)
pkl_file.close()
def testing(HD,MC,WT):
test = {'HD':[HD],'MC':[MC], 'WT':[WT]}
test = pd.DataFrame(test)
test['MC_encoded'] = le_mc.transform(test['MC'])
pred_val = pickle_model.fit_predict(test[['HD','MC_encoded']])
print(pred_val)
return(pred_val)
pred_val = testing(200,'Other',4.5)结果:
[-1]发布于 2019-09-10 13:37:28
似乎你的泡菜文件没有作为熊猫的数据加载。为什么不直接使用df_pickle = pd.read_pickle('/MC.pkl')呢?其余的都应该在此之后到位。
发布于 2019-10-10 15:12:03
而不看其他东西:
pred_val = pickle_model.fit_predict(test[['HD','MC_encoded']])您正在使用pickle_model ()方法在test_data上训练您的fit_predict。首先,直接用.predict()替换它,这样就可以按原样使用模型,而不是对单个样本进行训练。
https://datascience.stackexchange.com/questions/58960
复制相似问题