首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R-卡雷特:如何用多个模型建立一个更有效的模型并预测新的结果

R-卡雷特:如何用多个模型建立一个更有效的模型并预测新的结果
EN

Stack Overflow用户
提问于 2015-03-19 11:24:48
回答 1查看 1.3K关注 0票数 4

我的训练数据集(train)是一个包含n-features的数据框架,还有一个附加列,其中包含了features y。例如,我建立了3种个人模型:

代码语言:javascript
复制
m1 <- train(y ~ ., data = train, method = "lda")
m2 <- train(y ~ ., data = train, method = "rf")
m3 <- train(y ~ ., data = train, method = "gbm")

使用测试数据集( Test ),我可以评估这些个人模型的质量(当然,它具有y的结果):

代码语言:javascript
复制
pred1 <- predict(m1, newdata = test)
pred2 <- predict(m2, newdata = test)
pred3 <- predict(m3, newdata = test)

如果我在数据框架DATA_TO_PREDICT (结果未知)中应用每个单独的模型,再加上5个示例,输出自然是每个模型的5个预测:

代码语言:javascript
复制
predict(m1, DATA_TO_PREDICT)
predict(m2, DATA_TO_PREDICT)
predict(m3, DATA_TO_PREDICT)

现在,我想使用R-Caret软件包与随机森林的组合模型:

代码语言:javascript
复制
DF <- data.frame(pred1, pred2, pred3, y = test$y)
MODEL <- train(y ~ ., data = DF, method = "rf")

我可以看到,组合模型的精度提高了:

代码语言:javascript
复制
predMODEL <- predict(MODEL, DF)

但是,如果我将组合模型应用于DATA_TO_PREDICT (结果未知),输出不仅有5个预测,而且还有一个大列表,重复的结果和大于100的结果。我用过:

代码语言:javascript
复制
predict(MODEL, newdata = DATA_TO_PREDICT)

例子:

这里我给出了一个输出错误的具体例子。也就是说,我想预测4个新数据,但我得到的结果有几十个输出:

代码语言:javascript
复制
library(caret)
library(gbm)
set.seed(10)
library(AppliedPredictiveModeling)
data(AlzheimerDisease)
adData = data.frame(diagnosis,predictors)
inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]]
training = adData[ inTrain,]
testing = adData[-inTrain,]

inTEST <- (5:nrow(testing))
test <- testing[inTEST,]
DATA_TO_PREDICT <- testing[-inTEST,]

m1 <- train(diagnosis ~ ., data=training, method="rf")
m2 <- train(diagnosis ~ ., data=training, method="gbm")
m3 <- train(diagnosis ~ ., data=training, method="lda")
p1 <- predict(m1, newdata = test)
p2 <- predict(m2, newdata = test)
p3 <- predict(m3, newdata = test)

DF <- data.frame(p1, p2, p3, diagnosis = test$diagnosis)
MODEL <- train(diagnosis ~ ., data = DF, method = "rf")
predMODEL <- predict(MODEL, DF)

如果我建立了组合模型:

代码语言:javascript
复制
pred1 <- predict(m1, DATA_TO_PREDICT)
pred2 <- predict(m2, DATA_TO_PREDICT)
pred3 <- predict(m3, DATA_TO_PREDICT)
DF2 <- data.frame(pred1, pred2, pred3)
predict(MODEL, newdata = DF2) 

请注意,DATA_TO_PREDICT只有4个示例,输出如下:

代码语言:javascript
复制
  [1] Control Control Control Control Control Control Control Control
  [9] Control Control Control Control Control Control Control Control
 [17] Control Control Control Control Control Control Control Control
 [25] Control Control Control Control Control Control Control Control
 [33] Control Control Control Control Control Control Control Control
 [41] Control Control Control Control Control Control Control Control
 [49] Control Control Control Control Control Control Control Control
 [57] Control Control Control Control Control Control Control Control
 [65] Control Control Control Control Control Control Control Control
 [73] Control Control Control Control Control Control
 Levels: Impaired Control
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-03-19 14:37:55

这是因为对MODEL进行了关于三个单独模型(pred1pred2pred3对测试数据的预测)的培训,在最后一步中,DATA_TO_PREDICT被提供给由观察组成的MODEL。首先,必须存储DATA_TO_PREDICT各个模型的预测值,然后将其用作MODELnewdata

代码语言:javascript
复制
# (Beginning of the example omitted)
DF <- data.frame(p1, p2, p3, diagnosis = test$diagnosis)
# This trains a model with predictions as inputs:
MODEL <- train(diagnosis ~ ., data = DF, method = "rf")

# This is missing ----------------------
# To get the inputs for the ensemble model
# the predictions for DATA_TO_PREDICT are needed
p1b <- predict(m1, newdata = DATA_TO_PREDICT)
p2b <- predict(m2, newdata = DATA_TO_PREDICT)
p3b <- predict(m3, newdata = DATA_TO_PREDICT)
DFb <- data.frame(p1b, p2b, p3b)
colnames(DFb) <- c("p1", "p2", "p3")
#----------------------------------------

predMODEL <- predict(MODEL, DFb)
# [1] Control Control Control Control 
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/29143320

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档