我正在为{tidymodel}中的lasso回归模型创建并拟合一个工作流。模型很适合,但是当我去预测测试集时,我会发现一个错误,上面写着“new\_data中缺少了以下所需的列”。Tha列(“价格”)既列在火车上,也在测试装置上。这是个虫子吗?我遗漏了什么?
任何帮助都将不胜感激。
# split the data (target variable in house_sales_df is "price")
split <- initial_split(house_sales_df, prop = 0.8)
train <- split %>% training()
test <- split %>% testing()
# create and fit workflow
lasso_prep_recipe <-
recipe(price ~ ., data = train) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric())
lasso_model <-
linear_reg(penalty = 0.1, mixture = 1) %>%
set_engine("glmnet")
lasso_workflow <- workflow() %>%
add_recipe(lasso_prep_recipe) %>%
add_model(lasso_model)
lasso_fit <- lasso_workflow %>%
fit(data = train)
# predict test set
predict(lasso_fit, new_data = test)predict()导致此错误:
Error in `step_normalize()`:
! The following required column is missing from `new_data` in step 'normalize_MXQEf': price.
Backtrace:
1. stats::predict(lasso_fit, new_data = test, type = "numeric")
2. workflows:::predict.workflow(lasso_fit, new_data = test, type = "numeric")
3. workflows:::forge_predictors(new_data, workflow)
5. hardhat:::forge.data.frame(new_data, blueprint = mold$blueprint)
7. hardhat:::run_forge.default_recipe_blueprint(...)
8. hardhat:::forge_recipe_default_process(...)
10. recipes:::bake.recipe(object = rec, new_data = new_data)
12. recipes:::bake.step_normalize(step, new_data = new_data)
13. recipes::check_new_data(names(object$means), object, new_data)
14. cli::cli_abort(...)发布于 2022-10-27 02:23:48
您将得到错误,因为all_numeric()在step_normalize()中选择了在预测时不可维护的结果price。使用all_numeric_predictors(),你应该很好
# split the data (target variable in house_sales_df is "price")
split <- initial_split(house_sales_df, prop = 0.8)
train <- split %>% training()
test <- split %>% testing()
# create and fit workflow
lasso_prep_recipe <-
recipe(price ~ ., data = train) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric_predictors())
lasso_model <-
linear_reg(penalty = 0.1, mixture = 1) %>%
set_engine("glmnet")
lasso_workflow <- workflow() %>%
add_recipe(lasso_prep_recipe) %>%
add_model(lasso_model)
lasso_fit <- lasso_workflow %>%
fit(data = train)
# predict test set
predict(lasso_fit, new_data = test)https://stackoverflow.com/questions/74215751
复制相似问题