首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >基于Gridsearch的RFE排序

基于Gridsearch的RFE排序
EN

Stack Overflow用户
提问于 2020-07-16 09:46:37
回答 1查看 450关注 0票数 0

我想在管道中使用RFE进行特征选择。在没有GridSearch的情况下,让它在管道中工作是没有问题的。然而,当我尝试合并GridSearch时,我总是得到一个值错误(NB )。没有RFE的模型是很好的)。

我尝试使用feature_selection,就像在本主题:Grid Search with Recursive Feature Elimination in scikit-learn pipeline returns an error中所建议的那样,但是这会导致同样的错误。

有什么不对的?

my error: ValueError:用于估计器RFE的无效参数α(estimator=Ridge(alpha=1.0,copy_X=True,fit_intercept=True,max_iter=None,normalize=True,random_state=None,求解器=‘auto’,tol=0.001),n_features_to_select=4,step=1,verbose=1)。使用estimator.get_params().keys()__检查可用参数列表。

这个工作很好:

代码语言:javascript
复制
rfe=RFE(estimator=LinearRegression(), n_features_to_select=4, verbose=1)

#setup the pipeline steps
steps = [('scaler', StandardScaler()),
         ('imputation', SimpleImputer(missing_values = np.NaN, strategy='most_frequent')), 
         ('reg',  rfe)]
          
# Create the pipeline: pipeline
pipeline = Pipeline(steps)

# Create train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


# Fit the pipeline to the training set: 
pipeline.fit(X_train, y_train)

# Predict the labels of the test set
y_pred = pipeline.predict(X_test)

print()
# Print the features and their ranking (high = dropped early on)
print(dict(zip(X.columns, rfe.ranking_)))
# Print the features that are not eliminated
print(X.columns[rfe.support_])
print()

print("R^2: {}".format(pipeline.score(X_test, y_test)))
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("Root Mean Squared Error: {}".format(rmse))

,这不工作,

代码语言:javascript
复制
rfe=RFE(estimator=Ridge(normalize=True), n_features_to_select=4, verbose=1)

#setup the pipeline steps
steps = [('scaler', StandardScaler()),
         ('imputation', SimpleImputer(missing_values=np.NaN, strategy='most_frequent')), 
         ('ridge', rfe)]
          
# Create the pipeline: pipeline
pipeline = Pipeline(steps)

#Define hyperparameters and range of Grid Search
parameters = {"ridge__alpha": np.linspace(0,1,100)}

# Create train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# run cross validation
cv = GridSearchCV(pipeline, param_grid = parameters, cv=3)

# Fit the pipeline to the training set: 
cv.fit(X_train, y_train)

# Predict the labels of the test set
y_pred = cv.predict(X_test)

# Compute and print R^2 and RMSE
print("R^2: {}".format(cv.score(X_test, y_test)))
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("Root Mean Squared Error: {}".format(rmse))
print("Tuned Model Parameters: {}".format(cv.best_params_))

使用使用feature_selection也不工作

代码语言:javascript
复制
selector = feature_selection.RFE(Ridge(normalize=True))

#setup the pipeline steps
steps = [('scaler', StandardScaler()),
         ('imputation', SimpleImputer(missing_values=np.NaN, strategy='most_frequent')), 
         ('RFE', selector)]
          
# Create the pipeline: pipeline
pipeline = Pipeline(steps)
EN

回答 1

Stack Overflow用户

发布于 2021-05-21 13:13:37

这个问题很老,但万一有人发现了这个问题:

您可以使用参数‘_feature_selection_’访问feature_selection(estimator=)中的超参数alpha或估计器的任何参数:

代码语言:javascript
复制
from sklearn.pipeline import Pipeline
from sklearn.model_selection import TimeSeriesSplit, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.feature_selection import RFE

model = RFE(estimator=Ridge())

pipe = Pipeline(
    steps = [
        ("scaler", StandardScaler()),
        ("rfe", model)
    ]
)

param = {
    "rfe__step" : np.linspace(0.1, 1, 10),
    "rfe__estimator__alpha" : np.logspace(-3, 3, 7)
}

tscv = TimeSeriesSplit(n_splits=5).split(X_train)

gridsearch = GridSearchCV(estimator=pipe, cv=tscv, param_grid=param, refit=True, return_train_score=True, n_jobs=-1)
fit = gridsearch.fit(X_train, y_train)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62931906

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档