问如何根据不同的指标使用workflow_set (tidymodels)选择多个模型
EN

Stack Overflow用户

提问于 2021-10-01 15:56:44

回答 1查看 61关注 0票数 1

我正确地运行了以下模型，我需要选择最佳的两个(对于一个或多个指标)。模型之间的区别在于对不平衡数据采取不同步骤的配方对象(无，smote，rose，upsample，step_adasyn)。我感兴趣的是选择多个，最好的两个，以及按不平衡函数选择。

                      yardstick::sensitivity, yardstick::specificity, 
                      yardstick::precision, yardstick::recall )
folds <- vfold_cv(data_train, v = 3, strata = class)

rec_obj_all <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) 

rec_obj_all_s <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_smote(class)

rec_obj_all_r <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors())  %>%
  step_rose(class)

rec_obj_all_up <- data_train %>% 
  recipe(clas ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_upsample(class)

rec_obj_all_ad <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_adasyn(class)

lasso_mod1 <- logistic_reg(penalty = tune(),
                          mixture = 1) %>%
  set_engine("glmnet")

tictoc::tic()

all_cores <- parallel::detectCores(logical = FALSE)
library(doFuture)
registerDoFuture()
cl <- parallel::makeCluster(all_cores-4)
plan(cluster, workers = cl)

balances <- 
  workflow_set(
    preproc = list(unba = rec_obj_all, b_sm = rec_obj_all_s, b_ro = rec_obj_all_r,
                   b_up = rec_obj_all_up, b_ad = rec_obj_all_ad), 
    models = list(lasso_mod1),
    cross = TRUE
  )

grid_ctrl <-
  control_grid(
    save_pred = TRUE,
    parallel_over = "everything",
    save_workflow = FALSE
  )

grid_results <-
  balances %>%
  workflow_map(
    seed = 1503,
    resamples = folds,
    grid = 25,
    metrics = metrics_lasso,
    control = grid_ctrl,
    verbose = TRUE)
    

parallel::stopCluster( cl )

tictoc::toc()```

I don´t understand what is the correspond function to select the best two or more models with the package workflowsets.

classification

workflow

glmnet

tidymodels

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-10-15 18:50:08

有一些convenience functions in workflowsets可以对结果进行排名并提取最佳结果，但如果您有更具体的用例(最好是两个，或者基于更复杂的过滤)，那么可以继续使用tidyr + dplyr动词在grid_results中处理结果。您可以使用unnest()和/或使用rank_results()的结果来获取您感兴趣的内容。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69408784

复制

相似问题

问如何根据不同的指标使用workflow_set (tidymodels)选择多个模型
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何根据不同的指标使用workflow_set (tidymodels)选择多个模型EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何根据不同的指标使用workflow_set (tidymodels)选择多个模型
EN