最近,我正在学习如何使用带有并行化的mlr3包。作为mlr3图书(https://mlr3book.mlr-org.com/technical.html)和教程(https://www.youtube.com/watch?v=T43hO2o_nZw&t=1s)的介绍,mlr3使用未来的后端进行并行化。我使用以下代码运行一个简单的测试:
# load the packages
library(future)
library(future.apply)
library(mlr3)
# set the task
task_train <- TaskClassif$new(id = "survey_train", backend = train, target = "r_yn", positive = "yes")
# set the learner
learner_ranger <- mlr_learners$get("classif.ranger")
# set the cv
cv_5 <- rsmp("cv", folds = 5)
# run the resampling in parallelization
plan(multisession, workers = 5)
task_train_cv_5_par <- resample(task = task_train, learner = learner_ranger, resampling = cv_5)
plan(sequential)
task_train_cv_5_par$aggregate(msr("classif.auc"))AUC每次都会改变,我知道这是因为我没有为并行化设置随机种子。但是,我已经发现了许多关于未来包的教程,获得未来结果的方法是使用future_lapply from future.apply package并设置future.seed = TRUE。另一种方法是使用%dorng%或registerDoRNG()为foreach循环设置未来后端。
我的问题是,在不使用mlr3 或foreach的情况下,如何在future_lapply中获得可重复的重采样结果?我想也许有个简单的方法。非常感谢!
发布于 2021-02-17 09:18:43
我已经将您的示例更改为可重复的,以表明您只需要用set.seed()设置一个种子
library(mlr3)
library(mlr3learners)
task_train <- tsk("sonar")
learner_ranger <- lrn("classif.ranger", predict_type = "prob")
cv_5 <- rsmp("cv", folds = 5)
plan(multisession, workers = 5)
# 1st resampling
set.seed(1)
task_train_cv_5_par <- resample(task = task_train, learner = learner_ranger, resampling = cv_5)
task_train_cv_5_par$aggregate(msr("classif.auc"))
# 2nd resampling
set.seed(1)
task_train_cv_5_par <- resample(task = task_train, learner = learner_ranger, resampling = cv_5)
task_train_cv_5_par$aggregate(msr("classif.auc"))
# 3rd resampling, now sequential
plan(sequential)
set.seed(1)
task_train_cv_5_par <- resample(task = task_train, learner = learner_ranger, resampling = cv_5)
task_train_cv_5_par$aggregate(msr("classif.auc"))所有三次重放你都应该得到相同的分数。
发布于 2021-02-17 06:56:54
您需要使用支持并行化的RNG类型来设置种子。
set.seed(42, "L'Ecuyer-CMRG")有关详细信息,请参阅?RNGkind。
AFAIK对于R中的确定性并行结果,除了使用这种RNG类外,没有别的方法。在按顺序运行时,您只需在set.seed(42)中使用默认的RNG类型。
我的问题是,在不使用future_lapply或foreach的情况下,如何在mlr3中获得可重复的重采样结果?
{mlr3}用于所有类型的内部并行化,因此没有方法可以绕过{未来}。所以是的,设置future.seed = TRUE,你应该会没事的。
https://stackoverflow.com/questions/66235355
复制相似问题