首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何根据不同的指标使用workflow_set (tidymodels)选择多个模型

如何根据不同的指标使用workflow_set (tidymodels)选择多个模型
EN

Stack Overflow用户
提问于 2021-10-01 15:56:44
回答 1查看 61关注 0票数 1

我正确地运行了以下模型,我需要选择最佳的两个(对于一个或多个指标)。模型之间的区别在于对不平衡数据采取不同步骤的配方对象(无,smote,rose,upsample,step_adasyn)。我感兴趣的是选择多个,最好的两个,以及按不平衡函数选择。

代码语言:javascript
复制
                      yardstick::sensitivity, yardstick::specificity, 
                      yardstick::precision, yardstick::recall )
folds <- vfold_cv(data_train, v = 3, strata = class)

rec_obj_all <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) 

rec_obj_all_s <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_smote(class)

rec_obj_all_r <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors())  %>%
  step_rose(class)

rec_obj_all_up <- data_train %>% 
  recipe(clas ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_upsample(class)

rec_obj_all_ad <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_adasyn(class)

lasso_mod1 <- logistic_reg(penalty = tune(),
                          mixture = 1) %>%
  set_engine("glmnet")

tictoc::tic()

all_cores <- parallel::detectCores(logical = FALSE)
library(doFuture)
registerDoFuture()
cl <- parallel::makeCluster(all_cores-4)
plan(cluster, workers = cl)

balances <- 
  workflow_set(
    preproc = list(unba = rec_obj_all, b_sm = rec_obj_all_s, b_ro = rec_obj_all_r,
                   b_up = rec_obj_all_up, b_ad = rec_obj_all_ad), 
    models = list(lasso_mod1),
    cross = TRUE
  )

grid_ctrl <-
  control_grid(
    save_pred = TRUE,
    parallel_over = "everything",
    save_workflow = FALSE
  )

grid_results <-
  balances %>%
  workflow_map(
    seed = 1503,
    resamples = folds,
    grid = 25,
    metrics = metrics_lasso,
    control = grid_ctrl,
    verbose = TRUE)
    

parallel::stopCluster( cl )

tictoc::toc()```

I don´t understand what is the correspond function to select the best two or more models with the package workflowsets.
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-10-15 18:50:08

有一些convenience functions in workflowsets可以对结果进行排名并提取最佳结果,但如果您有更具体的用例(最好是两个,或者基于更复杂的过滤),那么可以继续使用tidyr + dplyr动词在grid_results中处理结果。您可以使用unnest()和/或使用rank_results()的结果来获取您感兴趣的内容。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69408784

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档