文章/答案/技术大牛

发布

问GridSearch中的Best_params
EN

Stack Overflow用户

提问于 2019-07-02 18:57:20

回答 1查看 315关注 0票数 1

为了找到参数的最佳组合，我使用了grid_search，并绘制了一个图，以查看当参数改变时，分数是如何变化的。当我运行gs_clf.best_params_时，我得到了参数的最佳组合：{'learning_rate'：0.01，'n_estimators'：200}我不明白为什么验证图没有显示这个参数组合的最佳分数？

下面提供了我的代码。

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.metrics import  accuracy_score, average_precision_score, recall_score, f1_score, precision_recall_curve, auc, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import numpy as np


clf = GradientBoostingClassifier(min_samples_split=300, max_depth=4, random_state=0)

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=0) 

number_of_estimators= [20,200]
LR=[0.01,1]

grid = GridSearchCV(clf, param_grid = dict(n_estimators=number_of_estimators,learning_rate=LR), cv=kfold, return_train_score=True, scoring = 'accuracy', pre_dispatch='1*n_jobs',n_jobs=1)

gs_clf = grid.fit(X_train, Y_train.values.ravel()) # Fit the Grid Search on Train dataset

scores = [x for x in gs_clf.cv_results_['mean_train_score']]
scores = np.array(scores).reshape(len(number_of_estimators), len(LR))

for ind, i in enumerate(number_of_estimators):
    plt.plot(LR, scores[ind], label='Number_of_estimators: ' + str(i))
plt.legend()
plt.xlabel('Learning rate')
plt.ylabel('Mean score')
plt.title('Train score')
plt.show()

scores = [x for x in gs_clf.cv_results_['mean_test_score']]
scores = np.array(scores).reshape(len(number_of_estimators), len(LR))

for ind, i in enumerate(number_of_estimators):
    plt.plot(LR, scores[ind], label='Number_of_estimators: ' + str(i))
plt.legend()
plt.xlabel('Learning rate')
plt.ylabel('Mean score')
plt.title('Validation score')
plt.show()

gs_clf.best_params

我得到的图的图像：

Train score plot

Validation score plot

python

validation

model

grid-search

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-07-02 22:39:00

问题实际上出在我在图表上显示数字的方式上。这是正确的绘图代码：

#TRAIN DATA
scores=gs_clf.cv_results_['mean_train_score']
scores = np.array(scores).reshape(len(LR), len(number_of_estimators))

for ind, i in enumerate(LR):
    plt.plot(number_of_estimators, scores[ind], label='Learning rate: ' + str(i))
plt.legend()
plt.xlabel('Number_of_estimators')
plt.ylabel('Mean score')
plt.title('Train score')
plt.show()


#VALIDATION DATA
scores=gs_clf.cv_results_['mean_test_score']
scores = np.array(scores).reshape(len(LR), len(number_of_estimators))

for ind, i in enumerate(LR):
    plt.plot(number_of_estimators, scores[ind], label='Learning rate: ' + str(i))
plt.legend()
plt.xlabel('Number_of_estimators')
plt.ylabel('Mean score')
plt.title('Validation score')
plt.show()

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56850920

复制

相似问题

问GridSearch中的Best_params
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问GridSearch中的Best_paramsEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问GridSearch中的Best_params
EN