首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Caret不能并行运行

Caret不能并行运行
EN

Stack Overflow用户
提问于 2015-09-11 10:02:53
回答 1查看 1.4K关注 0票数 3

实际并行化插入符号取决于R、插入符号和doMC包。如Parallelizing Caret code所述

有没有人像我一样在类似的环境下工作?当R插入符号并行化正常工作时,最大的R版本是多少?

代码语言:javascript
复制
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=C                  LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-52    ggplot2_1.0.1   lattice_0.20-31 doMC_1.3.3      iterators_1.0.7 foreach_1.4.2   RStudioAMI_0.2 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1         magrittr_1.5        splines_3.2.1       MASS_7.3-41         munsell_0.4.2       colorspace_1.2-6   
 [7] minqa_1.2.4         car_2.1-0           stringr_1.0.0       plyr_1.8.3          tools_3.2.1         pbkrtest_0.4-2     
[13] nnet_7.3-9          grid_3.2.1          gtable_0.1.2        nlme_3.1-120        mgcv_1.8-6          quantreg_5.19      
[19] MatrixModels_0.4-1  gtools_3.5.0        lme4_1.1-9          digest_0.6.8        Matrix_1.2-0        nloptr_1.0.4       
[25] reshape2_1.4.1      codetools_0.2-11    stringi_0.5-5       BradleyTerry2_1.0-6 scales_0.3.0        stats4_3.2.1       
[31] SparseM_1.7         brglm_0.5-9         proto_0.3-10

更新1:我的代码如下:

代码语言:javascript
复制
library(doMC) ; registerDoMC(cores=4)
library(caret)
classification_formula <- as.formula(paste("target" ,"~",
                                             paste(names(m_input_data)[!names(m_input_data)=='target'],collapse="+")))

CVfolds <- 2
CVreps  <- 5
ma_control <- trainControl(method = "repeatedcv",
                             number = CVfolds,
                             repeats = CVreps ,
                             returnResamp = "final" ,
                             classProbs = T,
                             summaryFunction = twoClassSummary,
                             allowParallel = TRUE,verboseIter = TRUE)
 rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))
 rf <- train(classification_formula , data = m_input_data , method = "rf", metric="ROC" ,trControl = ma_control, tuneGrid = rf_tuneGrid , ntree = 101)

更新2:当我从命令行运行时,当我从Rstudio运行这些脚本时,只有一个核心在工作,并行正在工作,因为我看到4个进程通过顶部。但在此之后的一秒钟,错误发生了:

代码语言:javascript
复制
  Error in names(resamples) <- gsub("^\\.", "", names(resamples)) : 
   attempt to set an attribute on NULL 

更新4:

嗨,似乎问题出在被终止的R会话中。每次启动AWS实例时,我都会运行R代码,现在刷新R引擎。现在,每次我刷新Rstudio浏览器时,我都会执行Session ->重启R。看起来它在运行。我现在正在检查从Ubuntu命令行运行脚本是否也是如此。

通常情况下,它在运行时没有结束。Caret在数据级别上是并行的。这意味着它能够在不同的过程中处理每个重采样。但如果样本仍然很大( 100,000 /2(折叠数= 2) X 2,000个特征),这可能很难为每个处理器单元完成。我说的对吗?

我认为并行性必须在算法级别上。这意味着每个算法都可能在多个内核上运行。如果这样的算法实现在插入符号中可用?

EN

回答 1

Stack Overflow用户

发布于 2015-09-14 16:25:43

我有针对Linux平台的最新版本,R版本3.2.2 (2015-08-14,Fire Safety),并行化工作正常。你能提供你不能并行工作的代码吗?

代码语言:javascript
复制
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] kernlab_0.9-22  doMC_1.3.3      iterators_1.0.7 foreach_1.4.2   caret_6.0-52    ggplot2_1.0.1   lattice_0.20-33

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0         compiler_3.2.2      nloptr_1.0.4        plyr_1.8.3          tools_3.2.2         digest_0.6.8       
 [7] lme4_1.1-9          nlme_3.1-122        gtable_0.1.2        mgcv_1.8-7          Matrix_1.2-2        brglm_0.5-9        
[13] SparseM_1.7         proto_0.3-10        BradleyTerry2_1.0-6 stringr_1.0.0       gtools_3.5.0        MatrixModels_0.4-1 
[19] stats4_3.2.2        grid_3.2.2          nnet_7.3-10         minqa_1.2.4         reshape2_1.4.1      car_2.0-26         
[25] magrittr_1.5        scales_0.3.0        codetools_0.2-11    MASS_7.3-43         splines_3.2.2       pbkrtest_0.4-2     
[31] colorspace_1.2-6    quantreg_5.18       stringi_0.5-5       munsell_0.4.2      

我在我的本地机器上使用了您的BreastCancer数据集的代码,它可以并行工作,没有任何问题。我使用的是RStudio版本0.98.1103。

代码语言:javascript
复制
library(caret)
library(mlbench)
data(BreastCancer)

library(doMC)  
registerDoMC(cores=2)

classification_formula <- as.formula(paste("Class" ,"~",
                                         paste(names(BreastCancer)[!names(BreastCancer)=='Class'],collapse="+")))

CVfolds <- 2
CVreps  <- 5
ma_control <- trainControl(method = "repeatedcv",
                           number = CVfolds,
                           repeats = CVreps ,
                           returnResamp = "final" ,
                           classProbs = T,
                           summaryFunction = twoClassSummary,
                           allowParallel = TRUE,verboseIter = TRUE)

rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))

#Notice, it might be easier just to use Class~. 
#instead of classification_formula
rf <- train(classification_formula , 
            data = BreastCancer , 
            method = "rf", 
            metric="ROC" ,
            trControl = ma_control, 
            tuneGrid = rf_tuneGrid , 
            ntree = 101)

> rf
Random Forest 

699 samples
 10 predictors
  2 classes: 'benign', 'malignant' 

No pre-processing
Resampling: Cross-Validated (2 fold, repeated 5 times) 
Summary of sample sizes: 341, 342, 342, 341, 342, 341, ... 
Resampling results across tuning parameters:

 mtry  ROC        Sens       Spec       ROC SD       Sens SD      Spec SD    
   2    0.9867820  1.0000000  0.0000000  0.005007691  0.000000000  0.000000000
   8    0.9899107  0.9549550  0.9640196  0.002243649  0.006714919  0.017247716
  14    0.9907072  0.9558559  0.9631933  0.003028258  0.012345228  0.008019979
  20    0.9909514  0.9635135  0.9556513  0.003268291  0.006864342  0.010471005
  26    0.9911480  0.9630631  0.9539706  0.003384987  0.005113930  0.010628533
  32    0.9911485  0.9657658  0.9522969  0.002973508  0.004842197  0.004090206

ROC was used to select the optimal model using  the largest value.
The final value used for the model was mtry = 32. 
> 
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/32514370

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档