当我使用caret::train运行分位数回归林模型时,我得到以下错误:Error in { : task 1 failed - "non-numeric argument to binary operator"。
当我将ntree设置为更高的数字时(在我的可复制示例中,这将是ntree = 150),我的代码运行时不会出错。
这段代码
library(caret)
library(quantregForest)
data(segmentationData)
dat <- segmentationData[segmentationData$Case == "Train",]
dat <- dat[1:50,]
# predictors
preds <- dat[,c(5:ncol(dat))]
# convert all to numeric
preds <- data.frame(sapply(preds, function(x) as.numeric(as.character(x))))
# response variable
response <- dat[,4]
# set up error measures
sumfct <- function(data, lev = NULL, model = NULL){
RMSE <- sqrt(mean((data$pred - data$obs)^2, na.omit = TRUE))
c(RMSE = RMSE)
}
# specify folds
set.seed(42, kind = "Mersenne-Twister", normal.kind = "Inversion")
folds_train <- caret::createMultiFolds(y = dat$Cell,
k = 10,
times = 5)
# specify trainControl for tuning mtry with the created multifolds
finalcontrol <- caret::trainControl(search = "grid", method = "repeatedcv", number = 10, repeats = 5,
index = folds_train, savePredictions = TRUE, summaryFunction = sumfct)
# build grid for tuning mtry
tunegrid <- expand.grid(mtry = c(2, 10, sqrt(ncol(preds)), ncol(preds)/3))
# train model
set.seed(42, kind = "Mersenne-Twister", normal.kind = "Inversion")
model <- caret::train(x = preds,
y = response,
method ="qrf",
ntree = 30, # with ntree = 150 it works
metric = "RMSE",
tuneGrid = tunegrid,
trControl = finalcontrol,
importance = TRUE,
keep.inbag = TRUE
)产生错误。使用我的真实数据的模型有ntree = 10000,但是任务仍然失败。我怎么才能解决这个问题?
在插入符号的源代码中,我可以找到错误消息Error in { : task 1 failed - "non-numeric argument to binary operator"的条件吗?错误消息来自源代码的哪一部分?
UPDATE:根据StupidWolf的答案将我的代码与真实数据进行了调整,所以如下所示:
# train model
set.seed(42, kind = "Mersenne-Twister", normal.kind = "Inversion")
model <- caret::train(x = preds,
y = response,
method ="qrf",
ntree = 30, # with ntree = 150 it works
metric = "RMSE",
sampsize = ceiling(length(response)*0.4)
tuneGrid = tunegrid,
trControl = finalcontrol,
importance = TRUE,
keep.inbag = FALSE
)使用我的真实数据,我仍然得到了上面的错误消息,所以在最坏的情况下,为了成功地计算模型,我必须将样本大小调整为0.1*length(response)。因此,只有设置keep.inbag = FALSE仍然会产生错误。我有多达1500个预测器,而样本(行)的数量只有50到60个。我还是不明白,到底是什么导致了错误信息。我尝试了没有sampsize参数的模型,但是总是设置keep.inbag = FALSE。这一错误仍在发生。只有将样本大小设置得很低才能确保成功。
如何在不设置采样大小的情况下成功运行模型?实际上,我想要的是自举式的数据集,而不是用于训练森林的40 %或10%的人工样本集。
发布于 2020-07-13 22:18:13
您得到了错误,因为您在quantregforest 代码中使用了选项代码,第95行:
minoob <- min( apply(!is.na(valuesPredict),1,sum))
if(minoob<10) stop("need to increase number of trees for sufficiently many out-of-bag observations")因此,它要求你所有的观察至少有10例OOB (袋外),以保持袋外的预测。因此,如果您的真实数据是巨大的,ntrees需要保持袋外将是巨大的。
如果您使用插入符号来训练数据,保持OOB和使用savePredictions = TRUE似乎是多余的。总的来说,OOB预测可能没有那么有用,因为您将使用测试折叠来进行预测。
考虑到数据的大小,另一个选项是调整sampsize。在randomForest中,只对一些sampsize观测数据进行采样,并使用替换子集来拟合树。如果为此设置了较小的大小,则确保有足够的OOB。例如,在给出的示例中,我们可以看到:
model <- caret::train(x = preds,
y = response,
method ="qrf",
ntree = 30, sampsize=17,
metric = "RMSE",
tuneGrid = tunegrid,
trControl = finalcontrol,
importance = TRUE,
keep.inbag = TRUE)
model
Quantile Random Forest
50 samples
57 predictors
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 44, 43, 44, 46, 45, 46, ...
Resampling results across tuning parameters:
mtry RMSE
2.000000 42.53061
7.549834 42.72116
10.000000 43.11533
19.000000 42.80340
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 2.https://stackoverflow.com/questions/62875113
复制相似问题