我是一个使用R的新手,我正在尝试构建一个决策树。我已经使用了用于ctree的party包和用于rpart的rpart包。
但是,因为我需要对我的模型进行交叉验证,所以我开始使用caret包,因为我可以通过使用函数` `train()和我想要使用的方法来做到这一点。
library(caret)
cvCtrl <- trainControl(method = "repeatedcv", repeats = 2,
classProbs = TRUE)
ctree.installed<- train(TARGET ~ OPENING_BALANCE+ MONTHS_SINCE_EXPEDITION+
RS_DESC+SAP_STATUS+ ACTIVATION_STATUS+ ROTUL_STATUS+
SIM_STATUS+ RATE_PLAN_SEGMENT_NORM,
data=trainSet,
method = "ctree",
trControl = cvCtrl)但是,我的变量OPENING_BALANCE和MONTHS_SINCE_EXPEDITION缺少一些值,因此函数无法工作。我不明白为什么会发生这种情况,因为我正在尝试构建一棵树。当我使用其他包时,这个问题不会发生。
这是错误:
Error in na.fail.default(list(TARGET = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, :
missing values in object我不想使用na.action=pass,因为我真的不想丢弃这些观察结果。
我做错了什么吗?为什么会发生这种情况?你对此有什么建议吗?
发布于 2017-04-27 20:33:34
我开始考虑mlbench包的dataset PimaIndiansDiabetes2,它有一些缺失值。
data(PimaIndiansDiabetes2, package = "mlbench")
head(PimaIndiansDiabetes2)
pregnant glucose pressure triceps insulin mass pedigree age diabetes
1 6 148 72 35 NA 33.6 0.627 50 pos
2 1 85 66 29 NA 26.6 0.351 31 neg
3 8 183 64 NA NA 23.3 0.672 32 pos
4 1 89 66 23 94 28.1 0.167 21 neg
5 0 137 40 35 168 43.1 2.288 33 pos
6 5 116 74 NA NA 25.6 0.201 30 neg在train中,我将na.action设置为na.pass (这会导致返回数据集不变),然后在ctree中设置maxsurrogate参数
library(caret)
cvCtrl <- trainControl(method="repeatedcv", repeats = 2, classProbs = TRUE)
set.seed(1234)
ctree1 <- train(diabetes ~ ., data=PimaIndiansDiabetes2,
method = "ctree",
na.action = na.pass,
trControl = cvCtrl,
controls=ctree_control(maxsurrogate=2))结果是:
print(ctree1)
Conditional Inference Tree
392 samples
8 predictor
2 classes: 'neg', 'pos'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 2 times)
Summary of sample sizes: 691, 692, 691, 691, 691, 691, ...
Resampling results across tuning parameters:
mincriterion Accuracy Kappa
0.01 0.7349111 0.4044195
0.50 0.7485731 0.4412557
0.99 0.7323906 0.3921662
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mincriterion = 0.5.https://stackoverflow.com/questions/43655544
复制相似问题