文章/答案/技术大牛

发布

社区首页 >问答首页 >train()函数Caret for trees中缺少值错误

问train()函数Caret for trees中缺少值错误
EN

Stack Overflow用户

提问于 2017-04-27 18:45:46

回答 1查看 9K关注 0票数 6

我是一个使用R的新手，我正在尝试构建一个决策树。我已经使用了用于ctree的party包和用于rpart的rpart包。

但是，因为我需要对我的模型进行交叉验证，所以我开始使用caret包，因为我可以通过使用函数` `train()和我想要使用的方法来做到这一点。

library(caret)
cvCtrl <- trainControl(method = "repeatedcv", repeats = 2,
                   classProbs = TRUE)

ctree.installed<- train(TARGET ~ OPENING_BALANCE+ MONTHS_SINCE_EXPEDITION+
                    RS_DESC+SAP_STATUS+ ACTIVATION_STATUS+ ROTUL_STATUS+ 
                    SIM_STATUS+ RATE_PLAN_SEGMENT_NORM,
                    data=trainSet,
                    method = "ctree",
                    trControl = cvCtrl)

但是，我的变量OPENING_BALANCE和MONTHS_SINCE_EXPEDITION缺少一些值，因此函数无法工作。我不明白为什么会发生这种情况，因为我正在尝试构建一棵树。当我使用其他包时，这个问题不会发生。

这是错误：

Error in na.fail.default(list(TARGET = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,  : 
missing values in object

我不想使用na.action=pass，因为我真的不想丢弃这些观察结果。

我做错了什么吗？为什么会发生这种情况？你对此有什么建议吗？

decision-tree

missing-data

r-caret

rpart

回答 1

Stack Overflow用户

发布于 2017-04-27 20:33:34

我开始考虑mlbench包的dataset PimaIndiansDiabetes2，它有一些缺失值。

data(PimaIndiansDiabetes2, package = "mlbench")
head(PimaIndiansDiabetes2)

  pregnant glucose pressure triceps insulin mass pedigree age diabetes
1        6     148       72      35      NA 33.6    0.627  50      pos
2        1      85       66      29      NA 26.6    0.351  31      neg
3        8     183       64      NA      NA 23.3    0.672  32      pos
4        1      89       66      23      94 28.1    0.167  21      neg
5        0     137       40      35     168 43.1    2.288  33      pos
6        5     116       74      NA      NA 25.6    0.201  30      neg

在train中，我将na.action设置为na.pass (这会导致返回数据集不变)，然后在ctree中设置maxsurrogate参数

library(caret)
cvCtrl <- trainControl(method="repeatedcv", repeats = 2, classProbs = TRUE)
set.seed(1234)
ctree1 <- train(diabetes ~ ., data=PimaIndiansDiabetes2,
                    method = "ctree",
                    na.action  = na.pass,
                    trControl = cvCtrl,
                    controls=ctree_control(maxsurrogate=2))

结果是：

print(ctree1)
Conditional Inference Tree 

392 samples
  8 predictor
  2 classes: 'neg', 'pos' 

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 2 times) 
Summary of sample sizes: 691, 692, 691, 691, 691, 691, ... 
Resampling results across tuning parameters:

  mincriterion  Accuracy   Kappa    
  0.01          0.7349111  0.4044195
  0.50          0.7485731  0.4412557
  0.99          0.7323906  0.3921662

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was mincriterion = 0.5.

票数 7

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43655544

复制

相似问题

问train()函数Caret for trees中缺少值错误
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问train()函数Caret for trees中缺少值错误EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问train()函数Caret for trees中缺少值错误
EN