我正在尝试训练和预测一个模型,使用下面的代码在一个数据集上有大约300条记录和100个特性。我想知道我在下面代码中搜索的n_estimators的选择是否太高了?既然我只有300张唱片,那么为n_estimators尝试10,20,30这样的记录会更有意义吗?n_estimators是否与用于培训数据的数据集大小相关?学习率怎么样?
代码:
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score, make_scorer
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
# TODO: Initialize the classifier
clf = AdaBoostClassifier(random_state=0)
# TODO: Create the parameters list you wish to tune
parameters = {'n_estimators':[100,200,300],'learning_rate':[1.0,2.0,4.0]}
# TODO: Make an fbeta_score scoring object
scorer = make_scorer(accuracy_score)
# TODO: Perform grid search on the classifier using 'scorer' as the scoring method
grid_obj = GridSearchCV(clf,parameters,scoring=scorer)
# TODO: Fit the grid search object to the training data and find the optimal parameters
grid_fit = grid_obj.fit(X_train,y_train)
# Get the estimator
best_clf = grid_fit.best_estimator_
# Make predictions using the unoptimized and model
predictions = (clf.fit(X_train, y_train)).predict(X_test)
best_predictions = best_clf.predict(X_test)发布于 2017-11-10 06:04:47
让我们一次接一次地接受:
希望这会有所帮助:)
https://stackoverflow.com/questions/47216224
复制相似问题