文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在RandomForestClassifier中选择n_estimators？

问如何在RandomForestClassifier中选择n_estimators？
EN

Stack Overflow用户

提问于 2020-03-20 11:05:50

回答 3查看 3.2K关注 0票数 1

我在一个预处理的数据集上用python构建了一个随机森林二进制分类器，该数据集有4898个实例，60-40个分层分裂比，78%的数据属于一个目标标签，其余的属于另一个目标标签。为了获得最实用/最好的随机森林分类器模型，我应该选择什么n_estimators值？我使用下面的代码片段绘制了精确度与n_estimators曲线。x_trai和y_train分别是训练集中的特征和目标标签，x_test和y_test分别是测试集中的特征和目标标签。

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
scores =[]
for k in range(1, 200):
    rfc = RandomForestClassifier(n_estimators=k)
    rfc.fit(x_train, y_train)
    y_pred = rfc.predict(x_test)
    scores.append(accuracy_score(y_test, y_pred))

import matplotlib.pyplot as plt
%matplotlib inline

# plot the relationship between K and testing accuracy
# plt.plot(x_axis, y_axis)
plt.plot(range(1, 200), scores)
plt.xlabel('Value of n_estimators for Random Forest Classifier')
plt.ylabel('Testing Accuracy')

在这里，可以看出，n_estimators的高值将提供良好的精确度分数，但即使是n_estimators的近似值，它也会在曲线中随机波动，因此我无法准确地选择最佳的值。我只想知道如何调整n_estimators超参数，我应该如何选择它，请帮助。应该使用ROC或CAP曲线而不是accuracy_score吗？谢谢。

python

classification

random-forest

hyperparameters

回答 3

Stack Overflow用户

发布于 2021-03-06 22:14:21

参见(https://github.com/dnishimoto/python-deep-learning/blob/master/Random%20Forest%20Tennis.ipynb)随机搜索see示例

我使用RandomSearchCV为随机森林分类器找到最佳参数

n_estimators是要使用的决策树的数量。

尝试使用XBBoost来获得更高的准确性。

parameter_grid={'n_estimators':[1,2,3,4,5],'max_depth':[2,4,6,8,10],'min_samples_leaf': 
[1,2,4],'max_features':[1,2,3,4,5,6,7,8]}

number_models=4
random_RandomForest_class=RandomizedSearchCV(
estimator=pipeline['clf'],
param_distributions=parameter_grid,
n_iter=number_models,
scoring='accuracy',
n_jobs=2,
cv=4,
refit=True,
return_train_score=True)

random_RandomForest_class.fit(X_train,y_train)
predictions=random_RandomForest_class.predict(X)

print("Accuracy Score",accuracy_score(y,predictions));
print("Best params",random_RandomForest_class.best_params_)
print("Best score",random_RandomForest_class.best_score_)

票数 0

Stack Overflow用户

发布于 2021-03-06 23:47:59

随机森林在一些n_estimators之后会稳定下来是很自然的(因为不存在与boosting不同的“减慢”拟合的机制)。由于添加更多弱树估计器没有任何好处，因此您可以选择50个左右

票数 0

Stack Overflow用户

发布于 2021-10-20 20:34:32

在这种情况下不要使用gridsearch -这是一种过度的杀伤力-而且，由于您随意设置参数，您可能最终不会得到不是最优数字的结果。

在scikit learn中有一个stage_predict属性，您可以在训练的每个阶段测量验证误差，以找到最佳的树数量。

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X_train, X_val, y_train, y_val = train_test_split(X, y)

# try a big number for n_estimator
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=100)
gbrt.fit(X_train, y_train)

# calculate error on validation set
errors = [mean_squared_error(y_val, y_pred)
 for y_pred in gbrt.staged_predict(X_val)]

bst_n_estimators = np.argmin(errors) + 1
gbrt_best = GradientBoostingRegressor(max_depth=2,n_estimators=bst_n_estimators)
gbrt_best.fit(X_train, y_train)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60768008

复制

相似问题

问如何在RandomForestClassifier中选择n_estimators？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在RandomForestClassifier中选择n_estimators？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在RandomForestClassifier中选择n_estimators？
EN