文章/答案/技术大牛

发布

社区首页 >问答首页 >估计管道(SVR)的无效参数

问估计管道(SVR)的无效参数
EN

Stack Overflow用户

提问于 2021-03-02 22:51:39

回答 1查看 1.1K关注 0票数 0

我有一个包含100列连续特性的数据集和一个连续标签，我想运行SVR；提取相关性的特性，调优超参数，然后交叉验证适合我的数据的模型。

我写了这段代码：

X_train, X_test, y_train, y_test = train_test_split(scaled_df, target, test_size=0.2)
    
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

# define the pipeline to evaluate
model = SVR()
fs = SelectKBest(score_func=mutual_info_regression)
pipeline = Pipeline(steps=[('sel',fs), ('svr', model)])

# define the grid
grid = dict()

#How many features to try
grid['estimator__sel__k'] = [i for i in range(1, X_train.shape[1]+1)]


# define the grid search
#search = GridSearchCV(pipeline, grid, scoring='neg_mean_squared_error', n_jobs=-1, cv=cv)
search = GridSearchCV(
        pipeline,
#        estimator=SVR(kernel='rbf'),
        param_grid={
            'estimator__svr__C': [0.1, 1, 10, 100, 1000],
            'estimator__svr__epsilon': [0.0001, 0.0005,  0.001, 0.005,  0.01, 0.05, 1, 5, 10],
            'estimator__svr__gamma': [0.0001, 0.0005,  0.001, 0.005,  0.01, 0.05, 1, 5, 10]
        },
        scoring='neg_mean_squared_error',
        verbose=1,
        n_jobs=-1)

for param in search.get_params().keys():
    print(param)

# perform the search
results = search.fit(X_train, y_train)

# summarize best
print('Best MAE: %.3f' % results.best_score_)
print('Best Config: %s' % results.best_params_)

# summarize all
means = results.cv_results_['mean_test_score']
params = results.cv_results_['params']
for mean, param in zip(means, params):
    print(">%.3f with: %r" % (mean, param))

我知道错误：

ValueError: Invalid parameter estimator for estimator Pipeline(memory=None,
         steps=[('sel',
                 SelectKBest(k=10,
                             score_func=<function mutual_info_regression at 0x7fd2ff649cb0>)),
                ('svr',
                 SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
                     gamma='scale', kernel='rbf', max_iter=-1, shrinking=True,
                     tol=0.001, verbose=False))],
         verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.

当我按照错误消息中的建议打印estimator.get_params().keys()时，我得到：

cv
error_score
estimator__memory
estimator__steps
estimator__verbose
estimator__sel
estimator__svr
estimator__sel__k
estimator__sel__score_func
estimator__svr__C
estimator__svr__cache_size
estimator__svr__coef0
estimator__svr__degree
estimator__svr__epsilon
estimator__svr__gamma
estimator__svr__kernel
estimator__svr__max_iter
estimator__svr__shrinking
estimator__svr__tol
estimator__svr__verbose
estimator
iid
n_jobs
param_grid
pre_dispatch
refit
return_train_score
scoring
verbose
Fitting 5 folds for each of 405 candidates, totalling 2025 fits

但当我改变台词时：

pipeline = Pipeline(steps=[('sel',fs), ('svr', model)])

至：

pipeline = Pipeline(steps=[('estimator__sel',fs), ('estimator__svr', model)])

我知道错误：

ValueError: Estimator names must not contain __: got ['estimator__sel', 'estimator__svr']

有人能解释我做错了什么，即如何将管道/特性选择步骤合并到GridSearchCV中吗？

顺便提一下，如果我在pipeline中注释掉GridSearchCV，取消注释estimator=SVR(kernal='rbf')，单元格就没有问题地运行，但在这种情况下，我假设我没有将特性选择包含进来，因为它不在任何地方被调用。我以前见过一些这样的问题，比如here，但是他们似乎没有回答这个具体的问题。

有更干净的方法来写这个吗？

python

machine-learning

scikit-learn

svm

pipeline

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-03-03 14:39:48

第一个错误消息是关于pipeline参数，而不是search参数，并指示您的param_grid是坏的，而不是管道步骤名。运行pipeline.get_params().keys()应该会显示正确的参数名。你的网格应该是：

        param_grid={
            'svr__C': [0.1, 1, 10, 100, 1000],
            'svr__epsilon': [0.0001, 0.0005,  0.001, 0.005,  0.01, 0.05, 1, 5, 10],
            'svr__gamma': [0.0001, 0.0005,  0.001, 0.005,  0.01, 0.05, 1, 5, 10]
        },

我不知道如何用普通SVR代替管道运行；参数网格也没有指定正确的内容.

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66448305

复制

相似问题

问估计管道(SVR)的无效参数
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问估计管道(SVR)的无效参数EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问估计管道(SVR)的无效参数
EN