我目前正在学习如何在R中实施后勤倒退。
我使用了一个数据集并将其划分为一个培训和测试集,并希望使用交叉验证来实现forward selection、backward selection和best subset selection,以选择最佳的特性。我使用caret在训练数据集上实现cross-validation,然后对测试数据进行预测测试。
我在插入符号中看到了rfe控件,并查看了caret 网站上的文档以及问题如何在R中使用包装器特征选择和算法?上的链接。我并不清楚如何更改特性选择的类型,因为它似乎默认为向后选择。有人能帮我处理我的工作流程吗。下面是一个可重复的例子
library("caret")
# Create an Example Dataset from German Credit Card Dataset
mydf <- GermanCredit
# Create Train and Test Sets 80/20 split
trainIndex <- createDataPartition(mydf$Class, p = .8,
list = FALSE,
times = 1)
train <- mydf[ trainIndex,]
test <- mydf[-trainIndex,]
ctrl <- trainControl(method = "repeatedcv",
number = 10,
savePredictions = TRUE)
mod_fit <- train(Class~., data=train,
method="glm",
family="binomial",
trControl = ctrl,
tuneLength = 5)
# Check out Variable Importance
varImp(mod_fit)
summary(mod_fit)
# Test the new model on new and unseen Data for reproducibility
pred = predict(mod_fit, newdata=test)
accuracy <- table(pred, test$Class)
sum(diag(accuracy))/sum(accuracy)发布于 2022-02-01 08:26:40
您可以简单地在mod_fit中调用它。当谈到向后逐步的时候,下面的代码就足够了
trControl <- trainControl(method="cv",
number = 5,
savePredictions = T,
classProbs = T,
summaryFunction = twoClassSummary)
caret_model <- train(Class~.,
train,
method="glmStepAIC", # This method fits best model stepwise.
family="binomial",
direction="backward", # Direction
trControl=trControl)注意,在trControl中
method= "cv", # No need to call repeated here, the number defined afterward defines the k-fold.
classProbs = T,
summaryFunction = twoClassSummary # Gives back ROC, sensitivity and specifity of the chosen model.https://stackoverflow.com/questions/42314851
复制相似问题