首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >sbf()是否使用度量参数来优化模型?

sbf()是否使用度量参数来优化模型?
EN

Stack Overflow用户
提问于 2016-10-07 16:51:50
回答 1查看 576关注 0票数 1

ROC作为metric参数值传递给caretSBF函数

我们的目标是使用ROC摘要度量来进行模型选择,同时通过过滤sbf()函数来进行特征选择。

BreastCancer数据集用作mlbench包中的可复制示例,以便使用metric = "Accuracy"metric = "ROC"运行train()sbf()

我们希望确保sbf()接受train()rfe()函数应用的metric参数来优化模型。为此,我们计划使用train()函数和sbf()函数。caretSBF$fit函数调用train()caretSBF被传递给sbfControl

从输出来看,metric参数似乎只用于inner resampling而不是sbf部分,即对于输出的outer resamplingmetric参数不是train()rfe()使用的。

由于我们使用了使用caretSBF (使用train() )的方法,似乎metric参数的作用范围仅限于train(),因此没有传递给sbf

我们希望澄清sbf()是否使用metric参数来优化模型,即outer resampling

下面是我们在可重复示例上的工作,展示了train()使用AccuracyROC使用metric参数,但是对于sbf,我们不确定。

I.数据部分

代码语言:javascript
复制
  ## Loading required packages   
  library(mlbench)
  library(caret)

  ## Loading `BreastCancer` Dataset from *mlbench* package   
  data("BreastCancer")

  ## Data cleaning for missing values
  # Remove rows/observation with NA Values in any of the columns
  BrC1 <- BreastCancer[complete.cases(BreastCancer),] 

  # Removing Class and Id Column and keeping just Numeric Predictors
  Num_Pred <- BrC1[,2:10]  

II.自定义摘要函数

定义fiveStats摘要函数

代码语言:javascript
复制
  fiveStats <- function(...) c(twoClassSummary(...),
                         defaultSummary(...))

III.列车区段

定义trControl

代码语言:javascript
复制
  trCtrl <- trainControl(method="repeatedcv", number=10,
  repeats=1, classProbs = TRUE, summaryFunction = fiveStats)

列车+公制=“精度”

代码语言:javascript
复制
   set.seed(1)
   TR_acc <- train(Num_Pred,BrC1$Class, method="rf",metric="Accuracy",
   trControl = trCtrl,tuneGrid=expand.grid(.mtry=c(2,3,4,5)))

   TR_acc
   # Random Forest 
   # 
   # 683 samples
   #   9 predictor
   #   2 classes: 'benign', 'malignant' 
   # 
   # No pre-processing
   # Resampling: Cross-Validated (10 fold, repeated 1 times) 
   # Summary of sample sizes: 615, 615, 614, 614, 614, 615, ... 
   # Resampling results across tuning parameters:
   # 
   #   mtry  ROC        Sens       Spec       Accuracy   Kappa    
   #   2     0.9936532  0.9729798  0.9833333  0.9765772  0.9490311
   #   3     0.9936544  0.9729293  0.9791667  0.9750853  0.9457534
   #   4     0.9929957  0.9684343  0.9750000  0.9706948  0.9361373
   #   5     0.9922907  0.9684343  0.9666667  0.9677536  0.9295782
   # 
   # Accuracy was used to select the optimal model using  the largest value.
   # The final value used for the model was mtry = 2. 

列车+米制= "ROC"

代码语言:javascript
复制
   set.seed(1)
   TR_roc <- train(Num_Pred,BrC1$Class, method="rf",metric="ROC",
   trControl = trCtrl,tuneGrid=expand.grid(.mtry=c(2,3,4,5)))
   TR_roc
   # Random Forest 
   # 
   # 683 samples
   #   9 predictor
   #   2 classes: 'benign', 'malignant' 
   # 
   # No pre-processing
   # Resampling: Cross-Validated (10 fold, repeated 1 times) 
   # Summary of sample sizes: 615, 615, 614, 614, 614, 615, ... 
   # Resampling results across tuning parameters:
   # 
   #   mtry  ROC        Sens       Spec       Accuracy   Kappa    
   #   2     0.9936532  0.9729798  0.9833333  0.9765772  0.9490311
   #   3     0.9936544  0.9729293  0.9791667  0.9750853  0.9457534
   #   4     0.9929957  0.9684343  0.9750000  0.9706948  0.9361373
   #   5     0.9922907  0.9684343  0.9666667  0.9677536  0.9295782
   # 
   # ROC was used to select the optimal model using  the largest value.
   # The final value used for the model was mtry = 3. 

IV.编辑caretSBF

编辑caretSBF摘要函数

代码语言:javascript
复制
   caretSBF$summary <- fiveStats

V. SBF部分

定义sbfControl

代码语言:javascript
复制
   sbfCtrl <- sbfControl(functions=caretSBF, 
   method="repeatedcv", number=10, repeats=1,
   verbose=T, saveDetails = T)

SBF +度量=“精度”

代码语言:javascript
复制
   set.seed(1)
   sbf_acc <- sbf(Num_Pred, BrC1$Class,
   sbfControl = sbfCtrl,
   trControl = trCtrl, method="rf", metric="Accuracy")

   ## sbf_acc  
   sbf_acc

   # Selection By Filter
   # 
   # Outer resampling method: Cross-Validated (10 fold, repeated 1 times) 
   # 
   # Resampling performance:
   # 
   #     ROC  Sens   Spec Accuracy Kappa    ROCSD SensSD  SpecSD AccuracySD  KappaSD
   #  0.9931 0.973 0.9833   0.9766 0.949 0.006272 0.0231 0.02913    0.01226 0.02646
   # 
   # Using the training set, 9 variables were selected:
   #    Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size...
   # 
   # During resampling, the top 5 selected variables (out of a possible 9):
   #    Bare.nuclei (100%), Bl.cromatin (100%), Cell.shape (100%), Cell.size (100%), Cl.thickness (100%)
   # 
   # On average, 9 variables were selected (min = 9, max = 9)

   ## Class of sbf_acc
   class(sbf_acc)
   # [1] "sbf"

   ## Names of elements of sbf_acc
   names(sbf_acc)
   #  [1] "pred"         "variables"    "results"      "fit"          "optVariables"
   #  [6] "call"         "control"      "resample"     "metrics"      "times"       
   # [11] "resampledCM"  "obsLevels"    "dots"        

   ## sbf_acc fit element*  
   sbf_acc$fit
   # Random Forest 
   # 
   # 683 samples
   #   9 predictor
   #   2 classes: 'benign', 'malignant' 
   # 
   # No pre-processing
   # Resampling: Cross-Validated (10 fold, repeated 1 times) 
   # Summary of sample sizes: 615, 614, 614, 615, 615, 615, ... 
   # Resampling results across tuning parameters:
   # 
   #   mtry  ROC        Sens       Spec       Accuracy   Kappa    
   #   2     0.9933176  0.9706566  0.9833333  0.9751492  0.9460717
   #   5     0.9920034  0.9662121  0.9791667  0.9707801  0.9363708
   #   9     0.9914825  0.9684343  0.9708333  0.9693308  0.9327662
   # 
   # Accuracy was used to select the optimal model using  the largest value.
   # The final value used for the model was mtry = 2. 

   ##  Elements of sbf_acc fit  
   names(sbf_acc$fit)
   #  [1] "method"       "modelInfo"    "modelType"    "results"      "pred"        
   #  [6] "bestTune"     "call"         "dots"         "metric"       "control"     
   # [11] "finalModel"   "preProcess"   "trainingData" "resample"     "resampledCM" 
   # [16] "perfNames"    "maximize"     "yLimits"      "times"        "levels"      

   ## sbf_acc fit final Model
   sbf_acc$fit$finalModel

   # Call:
   #  randomForest(x = x, y = y, mtry = param$mtry) 
   #                Type of random forest: classification
   #                      Number of trees: 500
   # No. of variables tried at each split: 2
   # 
   #         OOB estimate of  error rate: 2.34%
   # Confusion matrix:
   #           benign malignant class.error
   # benign       431        13  0.02927928
   # malignant      3       236  0.01255230

   ## sbf_acc metric
   sbf_acc$fit$metric
   # [1] "Accuracy"

   ## sbf_acc fit best Tune*  
   sbf_acc$fit$bestTune
   #   mtry
   # 1    2

SBF +度量= "ROC"

代码语言:javascript
复制
   set.seed(1)
   sbf_roc <- sbf(Num_Pred, BrC1$Class,
   sbfControl = sbfCtrl,
   trControl = trCtrl, method="rf", metric="ROC")


   ## sbf_roc  
   sbf_roc

   # Selection By Filter
   # 
   # Outer resampling method: Cross-Validated (10 fold, repeated 1 times) 
   # 
   # Resampling performance:
   # 
   #     ROC  Sens   Spec Accuracy Kappa    ROCSD SensSD  SpecSD AccuracySD KappaSD
   #  0.9931 0.973 0.9833   0.9766 0.949 0.006272 0.0231 0.02913    0.01226 0.02646
   # 
   # Using the training set, 9 variables were selected:
   #    Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size...
   # 
   # During resampling, the top 5 selected variables (out of a possible 9):
   #    Bare.nuclei (100%), Bl.cromatin (100%), Cell.shape (100%), Cell.size (100%), Cl.thickness (100%)
   # 
   # On average, 9 variables were selected (min = 9, max = 9)

   ## Class of sbf_roc
   class(sbf_roc)
   # [1] "sbf"

   ## Names of elements of sbf_roc
   names(sbf_roc)
   #  [1] "pred"         "variables"    "results"      "fit"          "optVariables"
   #  [6] "call"         "control"      "resample"     "metrics"      "times"       
   # [11] "resampledCM"  "obsLevels"    "dots"        

   ## sbf_roc fit element*  
   sbf_roc$fit
   # Random Forest 
   # 
   # 683 samples
   #   9 predictor
   #   2 classes: 'benign', 'malignant' 
   # 
   # No pre-processing
   # Resampling: Cross-Validated (10 fold, repeated 1 times) 
   # Summary of sample sizes: 615, 614, 614, 615, 615, 615, ... 
   # Resampling results across tuning parameters:
   # 
   #   mtry  ROC        Sens       Spec       Accuracy   Kappa    
   #   2     0.9933176  0.9706566  0.9833333  0.9751492  0.9460717
   #   5     0.9920034  0.9662121  0.9791667  0.9707801  0.9363708
   #   9     0.9914825  0.9684343  0.9708333  0.9693308  0.9327662
   # 
   # ROC was used to select the optimal model using  the largest value.
   # The final value used for the model was mtry = 2. 

   ##  Elements of sbf_roc fit  
   names(sbf_roc$fit)
   #  [1] "method"       "modelInfo"    "modelType"    "results"      "pred"        
   #  [6] "bestTune"     "call"         "dots"         "metric"       "control"     
   # [11] "finalModel"   "preProcess"   "trainingData" "resample"      "resampledCM" 
   # [16] "perfNames"    "maximize"     "yLimits"      "times"        "levels"      

   ## sbf_roc fit final Model
   sbf_roc$fit$finalModel

   # Call:
   #  randomForest(x = x, y = y, mtry = param$mtry) 
   #                Type of random forest: classification
   #                      Number of trees: 500
   # No. of variables tried at each split: 2
   # 
   #         OOB estimate of  error rate: 2.34%
   # Confusion matrix:
   #           benign malignant class.error
   # benign       431        13  0.02927928
   # malignant      3       236  0.01255230

   ## sbf_roc metric
   sbf_roc$fit$metric
   # [1] "ROC"

   ## sbf_roc fit best Tune  
   sbf_roc$fit$bestTune
   #   mtry
   # 1    2

sbf()是否使用metric参数来优化模型?如果是,metric sbf()使用什么作为缺省值?如果sbf()使用metric参数,那么如何将其设置为ROC

谢谢。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-10-14 12:56:53

sbf不使用该度量来优化任何东西(与rfe不同);sbf所做的一切就是在调用模型之前执行一个特性选择步骤。当然,您定义了过滤器,但是没有办法使用sbf优化筛选器,因此不需要度量来指导这一步骤。

使用sbf(x, y, metric = "ROC")将把metric = "ROC"传递给您所使用的任何建模函数(并且它设计用于在使用caretSBF时使用train。这是因为没有metric参数的sbf

代码语言:javascript
复制
> names(formals(caret:::sbf.default))
[1] "x"          "y"          "sbfControl" "..." 
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/39922359

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档