我用的是XGBRegressor和管道。管道包含预处理步骤和模型(XGBRegressor).。
以下是完整的预处理步骤。(我已经定义了numeric_cols和cat_cols)
numerical_transfer = SimpleImputer()
cat_transfer = Pipeline(steps = [
('imputer', SimpleImputer(strategy = 'most_frequent')),
('onehot', OneHotEncoder(handle_unknown = 'ignore'))
])
preprocessor = ColumnTransformer(
transformers = [
('num', numerical_transfer, numeric_cols),
('cat', cat_transfer, cat_cols)
])最后一条管道是
my_model = Pipeline(steps = [('preprocessor', preprocessor), ('model', model)])
当我尝试在不使用early_stopping_rounds代码的情况下进行安装时,效果很好。
(my_model.fit(X_train, y_train))
但是,当我使用early_stopping_rounds时,如下图所示,就会出现错误。
my_model.fit(X_train, y_train, model__early_stopping_rounds=5, model__eval_metric = "mae", model__eval_set=[(X_valid, y_valid)])
我的错误是:
model__eval_set=[(X_valid, y_valid)]) and the error is
ValueError: DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in fields MSZoning, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Condition1, Condition2, BldgType, HouseStyle, RoofStyle, RoofMatl, MasVnrType, ExterQual, ExterCond, Foundation, BsmtQual, BsmtCond, BsmtExposure, BsmtFinType1, BsmtFinType2, Heating, HeatingQC, CentralAir, Electrical, KitchenQual, Functional, FireplaceQu, GarageType, GarageFinish, GarageQual, GarageCond, PavedDrive, PoolQC, Fence, MiscFeature, SaleType, SaleCondition这是否意味着我应该在申请X_valid ()之前对my_model.fit进行预处理,还是我做错了什么?
如果问题是我们需要在应用fit()之前对X_valid进行预处理,那么如何使用我前面定义的预处理器来实现呢?
编辑:我试着在没有管道的情况下对X_valid进行预处理,但是我在说特性不匹配时出错了。
发布于 2020-09-04 10:46:31
问题是管道不适合eval_set。因此,正如您所说的,您需要对X_valid进行预处理。要做到这一点,最简单的方法是在没有“模型”步骤的情况下使用管道。在安装管道之前使用以下代码:
# Make a copy to avoid changing original data
X_valid_eval=X_valid.copy()
# Remove the model from pipeline
eval_set_pipe = Pipeline(steps = [('preprocessor', preprocessor)])
# fit transform X_valid.copy()
X_valid_eval = eval_set_pipe.fit(X_train, y_train).transform (X_valid_eval)然后在更改model__eval_set后按以下方式安装管道:
my_model.fit(X_train, y_train, model__early_stopping_rounds=5, model__eval_metric = "mae", model__eval_set=[(X_valid_eval, y_valid)])https://stackoverflow.com/questions/58136107
复制相似问题