我正确地运行了以下模型,我需要选择最佳的两个(对于一个或多个指标)。模型之间的区别在于对不平衡数据采取不同步骤的配方对象(无,smote,rose,upsample,step_adasyn)。我感兴趣的是选择多个,最好的两个,以及按不平衡函数选择。
yardstick::sensitivity, yardstick::specificity,
yardstick::precision, yardstick::recall )
folds <- vfold_cv(data_train, v = 3, strata = class)
rec_obj_all <- data_train %>%
recipe(class ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors())
rec_obj_all_s <- data_train %>%
recipe(class ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors()) %>%
step_smote(class)
rec_obj_all_r <- data_train %>%
recipe(class ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors()) %>%
step_rose(class)
rec_obj_all_up <- data_train %>%
recipe(clas ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors()) %>%
step_upsample(class)
rec_obj_all_ad <- data_train %>%
recipe(class ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors()) %>%
step_adasyn(class)
lasso_mod1 <- logistic_reg(penalty = tune(),
mixture = 1) %>%
set_engine("glmnet")
tictoc::tic()
all_cores <- parallel::detectCores(logical = FALSE)
library(doFuture)
registerDoFuture()
cl <- parallel::makeCluster(all_cores-4)
plan(cluster, workers = cl)
balances <-
workflow_set(
preproc = list(unba = rec_obj_all, b_sm = rec_obj_all_s, b_ro = rec_obj_all_r,
b_up = rec_obj_all_up, b_ad = rec_obj_all_ad),
models = list(lasso_mod1),
cross = TRUE
)
grid_ctrl <-
control_grid(
save_pred = TRUE,
parallel_over = "everything",
save_workflow = FALSE
)
grid_results <-
balances %>%
workflow_map(
seed = 1503,
resamples = folds,
grid = 25,
metrics = metrics_lasso,
control = grid_ctrl,
verbose = TRUE)
parallel::stopCluster( cl )
tictoc::toc()```
I don´t understand what is the correspond function to select the best two or more models with the package workflowsets.发布于 2021-10-15 18:50:08
有一些convenience functions in workflowsets可以对结果进行排名并提取最佳结果,但如果您有更具体的用例(最好是两个,或者基于更复杂的过滤),那么可以继续使用tidyr + dplyr动词在grid_results中处理结果。您可以使用unnest()和/或使用rank_results()的结果来获取您感兴趣的内容。
https://stackoverflow.com/questions/69408784
复制相似问题