我真的需要你的帮助。我正在尝试编写一个R脚本,它接受一些数据并使用glm包执行caret。这是我的代码:
set.seed(4000)
# Create training and test data with 80%-20% ratio
new_values$gender <- as.factor(new_values$gender)
trainingRows= createDataPartition(new_values$gender, p= .8, list= FALSE, times= 1)
training_data_set= new_values[trainingRows,]
test_data_set= new_values[-trainingRows,]
# Test training with 10 times cross-validation
fitness_control <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
# Train model with linear regression method (it takes about 5-10 minutes waiting time)
linear_regression <-train(gender~ ., data=training_data_set,method="glm",family=binomial(), trControl=fitness_control)
linear_regression下面是数据表:数据表
当我尝试运行这个脚本时,R需要很长的时间来加载,然后得到以下错误消息:
有些地方出了问题,所有的精确度量值都丢失了:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)警告信息如下:
警告消息: 1: Fold01的模型匹配失败: parameter=none错误:保护():保护堆栈溢出
2:模型匹配Fold02失败: parameter=none错误:保护():保护堆栈溢出
3:模型拟合Fold03失败: parameter=none错误:保护():保护堆栈溢出
4:模型拟合Fold04失败: parameter=none错误:保护():保护堆栈溢出
Fold05: parameter=none错误:parameter=none():保护堆栈溢出
6:模型匹配Fold06失败: parameter=none错误:保护():保护堆栈溢出
7:模型匹配Fold07失败: parameter=none错误:保护():保护堆栈溢出
Fold08: parameter=none错误:parameter=none():保护堆栈溢出
9:模型匹配Fold09失败: parameter=none错误:保护():保护堆栈溢出
Fold10: parameter=none错误:parameter=none():保护堆栈溢出
11:在nominalTrainWorkflow中(x= x,y= y,wts =权重,info = trainInfo,.:在重放性能度量中缺少值。
你能帮忙吗?
发布于 2022-05-25 16:15:54
与glmnet的配合似乎是可行的,虽然我还没有看过答案是否真的有意义!我必须解决一些数据问题,这可能是妨碍你的.
library(readxl)
library(caret)
library(glmnet)
library(dplyr)
dd <- (read_excel("thema3_results1.xlsx")
|> select(-1) ## drop row names
|> mutate(across(gender, factor))
|> mutate(across(-gender, as.numeric)) ## convert character to numeric!
)
set.seed(4000)
trainingRows <- createDataPartition(dd$gender, p= .8, list= FALSE, times= 1)
training_data_set <- dd[trainingRows,]
test_data_set <- dd[-trainingRows,]
# Test training with 10 times cross-validation
fitness_control <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
system.time(logistic_reg <- train(gender~ .,
data=training_data_set,
method="glmnet",
family="binomial", ## not binomial() for glmnet ...
trControl=fitness_control))训练步骤在我的机器上花了大约2秒,
这似乎是越来越精确的== 1,这可能意味着它仍然过分适合.?
https://stackoverflow.com/questions/72366373
复制相似问题