首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用GridSearchCV实现RandomForestRegressor

使用GridSearchCV实现RandomForestRegressor
EN

Stack Overflow用户
提问于 2015-01-11 18:14:04
回答 2查看 11K关注 0票数 7

我试着用GridSearchCV来表示RandomForestRegressor,但总是得到ValueError: Found array with dim 100. Expected 500。以这个玩具为例:

代码语言:javascript
复制
import numpy as np

from sklearn import ensemble
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import r2_score

if __name__ == '__main__':

    X = np.random.rand(1000, 2)
    y = np.random.rand(1000)

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.5, random_state=1)

    # Set the parameters by cross-validation
    tuned_parameters = {'n_estimators': [500, 700, 1000], 'max_depth': [None, 1, 2, 3], 'min_samples_split': [1, 2, 3]}

    # clf = ensemble.RandomForestRegressor(n_estimators=500, n_jobs=1, verbose=1)
    clf = GridSearchCV(ensemble.RandomForestRegressor(), tuned_parameters, cv=5, scoring=r2_score, n_jobs=-1, verbose=1)
    clf.fit(X_train, y_train)
    print clf.best_estimator_

我得到的是:

代码语言:javascript
复制
Fitting 5 folds for each of 36 candidates, totalling 180 fits
Traceback (most recent call last):
  File "C:\Users\abudis\Dropbox\machine_learning\toy_example.py", line 21, in <module>
    clf.fit(X_train, y_train)
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\grid_search.py", line 596, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\grid_search.py", line 378, in _fit
    for parameters in parameter_iterable
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\externals\joblib\parallel.py", line 653, in __call__
    self.dispatch(function, args, kwargs)
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\externals\joblib\parallel.py", line 400, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\externals\joblib\parallel.py", line 138, in __init__
    self.results = func(*args, **kwargs)
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\cross_validation.py", line 1240, in _fit_and_score
    test_score = _score(estimator, X_test, y_test, scorer)
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\cross_validation.py", line 1296, in _score
    score = scorer(estimator, X_test, y_test)
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\metrics\metrics.py", line 2324, in r2_score
    y_type, y_true, y_pred = _check_reg_targets(y_true, y_pred)
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\metrics\metrics.py", line 65, in _check_reg_targets
    y_true, y_pred = check_arrays(y_true, y_pred)
  File "C:\Users\abudis\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.py", line 254, in check_arrays
    % (size, n_samples))
ValueError: Found array with dim 100. Expected 500

出于某种原因,GridSearchCV认为n_estimators参数应该等于每个折叠的大小。如果我更改了n_estimators列表中tuned_parameters的第一个值,就会得到带有另一个期望值的ValueError

不过,使用clf = ensemble.RandomForestRegressor(n_estimators=500, n_jobs=1, verbose=1)对一个模型进行培训效果很好,所以不确定我是否做错了什么,或者scikit-learn中某个地方有错误。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-01-12 01:45:10

看起来像个bug,但在您的情况下,如果您使用RandomForestRegressor自己的记分器(巧合的是R^2分数),在GridSearchCV中不指定任何记分函数,它就能工作。

代码语言:javascript
复制
clf = GridSearchCV(ensemble.RandomForestRegressor(), tuned_parameters, cv=5, 
                   n_jobs=-1, verbose=1)

编辑:正如#4081中的@jnothman所提到的,这是真正的问题:

评分不接受度量函数。它接受一个函数的签名(估计,> X,y_true=None) ->浮点数。您可以使用评分=‘r2’或scoring=make_scorer(r2_score)。

票数 6
EN

Stack Overflow用户

发布于 2022-09-16 19:55:30

您可以使用"evaluation.html]“中的所有回归评分。

这是MSE的示例代码:

代码语言:javascript
复制
cv=RepeatedKFold(n_splits=10,n_repeats=3, random_state=100)



pipeRF = Pipeline([('classifier', [RandomForestRegressor()])]) 

param_grid = [{'classifier' : [RandomForestRegressor()],'classifier__n_estimators': [100, 200],'classifier__min_samples_split': [8, 10],'classifier__min_samples_leaf': [3, 4, 5],'classifier__max_depth': [80, 90]}]




clf = GridSearchCV(pipeRF, param_grid = param_grid, cv = cv, n_jobs=-1, scoring='neg_mean_squared_error')

对于r2的使用:

代码语言:javascript
复制
clf = GridSearchCV(pipeRF, param_grid = param_grid, cv = cv, n_jobs=-1, scoring='r2')
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/27890413

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档