文章/答案/技术大牛

发布

社区首页 >问答首页 >从cv.glmnet得到混淆矩阵

问从cv.glmnet得到混淆矩阵
EN

Stack Overflow用户

提问于 2021-11-09 17:11:07

回答 1查看 197关注 0票数 0

对问题的解释

我正在比较几个模型，而且我的数据集太小了，我宁愿使用交叉验证，也不愿分割一个验证集。我的一个模型是用glm "GLM“制作的，另一个是由cv.glmnet "GLMNET”制作的。在伪代码中，我希望能够做到以下几点：

initialize empty 2x2 matrices GLM_CONFUSION and GLMNET_CONFUSION

# Cross validation loop
For each data point VAL in my dataset X:
  Let TRAIN be the rest of X (not including VAL)

  Train GLM on TRAIN, use it to predict VAL
  Depending on if it were a true positive, false positive, etc...
    add 1 to the correct entry in GLM_CONFUSION

  Train GLMNET on TRAIN, use it to predict VAL
  Depending on if it were a true positive, false positive, etc...
    add 1 to the correct entry in GLMNET_CONFUSION

这并不难做到，问题在于cv.glmnet已经使用交叉验证来推断惩罚lambda的最佳值。如果我可以让cv.glmnet自动构建最佳模型的混淆矩阵，即我的代码应该如下所示，那就更方便了：

initialize empty 2x2 matrices GLM_CONFUSION and GLMNET_CONFUSION

Train GLMNET on X using cv.glmnet
Set GLMNET_CONFUSION to be the confusion matrix of lambda.1se (or lambda.min)

# Cross validation loop
For each data point VAL in my dataset X:
  Let TRAIN be the rest of X (not including VAL)

  Train GLM on TRAIN, use it to predict VAL
  Depending on if it were a true positive, false positive, etc...
    add 1 to the correct entry in GLM_CONFUSION

这不仅方便，而且有点必要--有两种选择：

使用cv.glmnet在交叉验证循环的每次迭代中找到一个新的lambda.1se。(即嵌套的cross-validation)
Use cv.glmnet，用于在X上查找lambda.1se，然后“修复”该值，并将其视为在交叉验证循环中训练的正常模型。(两个并行cross-validations)

)

第二个在哲学上是不正确的，因为它意味着GLMNET将有关于它试图在交叉验证循环中预测什么的信息。第一种方法需要很长时间--理论上我可以做到，但它可能需要半个小时，我觉得应该有更好的方法。

到目前为止我所看到的

我看过cv.glmnet的文档--似乎你不能按我的要求去做，但是我对R和数据科学非常陌生，所以我完全有可能错过了一些东西。

我也在这个网站上看过一些第一眼看上去是相关的帖子，但实际上要求的是不同的东西--例如，这篇文章：tidy predictions and confusion matrix with glmnet。

上面的帖子看起来与我想要的类似，但这并不完全是我想要的--他们似乎在使用predict.cv.glmnet进行新的预测，然后创建混淆矩阵--而我想要的是交叉验证步骤中所做预测的混淆矩阵。

我希望有人能

解释是否以及如何可以创建混淆矩阵，因为described

Show认为除了我提议的
"Hand-implement cv.glmnet之外还有第三个备选方案“不是一个可行的替代:P

”

Conclusively说我想要的是不可能的，我需要做我提到的两种选择之一。

其中任何一个都是对这个问题的完美回答(尽管我希望选择1!)

抱歉，如果我错过了一些简单的事情！

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-11-09 20:41:46

多亏了@老板娘的建议，我才能找到一个对我有用的解决方案！它与我的文章中的选项2相对应，这一选择是使用插入符号包。

本质上，我们需要附加一个自定义摘要功能到插入的模型培训师。在我开始工作之前的几个小时里，我大部分时间都在胡言乱语--也许有更好的方法来做这件事，我鼓励其他人在他们知道的情况下发表其他答案！我的代码在底部(它被稍加修改，使其不特定于我正在处理的任务)

希望如果有人有类似的问题，那么这将有所帮助。我发现在解决这个问题时有用的另一个资源是以下文章：https://stats.stackexchange.com/questions/299653/caret-glmnet-vs-cv-glmnet，与它一样，您可以非常清楚地看到如何将对cv.glmnet的调用转换为对插入符号的train版本的glmnet的调用。

library(caret)

# Confusion Matrix of model outputs
CM <- function(model) {
  # Need to find index of best tune found by
  # cross validation
  idx <- 1
  for (i in 1:nrow(model$results)) {
    check <- model$results[i,]
    foundBest <- TRUE
    for (col in colnames(model$bestTune)) {
      if (check[,col] != model$bestTune[,col]) {
        foundBest <- FALSE
        break
      }
    }
    if (foundBest) {
      idx <- i
      break
    }
  }
  
  # They are averaged w.r.t. the number of folds (ctrl$number)
  # hence the multiplication
  c(
    model$results[idx,]$true_pos,
    model$results[idx,]$false_pos,
    model$results[idx,]$false_neg,
    model$results[idx,]$true_neg
  ) * model$control$number
}

# Summary function from the training to give confusion metric
SummaryFunc <- function (data, lev = NULL, model = NULL) { 

    # This puts our output in the right format
    out <- postResample(data$pred, data$obs)

    # Get the confusion matrix
    cm <- confusionMatrix(
      factor(data$pred, levels=c(0, 1)),
      factor(data$obs, levels=c(0, 1))
    )$table
    
    # Add those details to the output
    oldnames <- names(out)
    out <- c(out, cm[1, 1], cm[2, 1], cm[1, 2], cm[2, 2])
    names(out) <- c(oldnames, "true_pos", "false_pos", "false_neg", "true_neg")
    
    out
}


# 10-fold cross validation, as in cv.glmnet implementation
ctrl <- trainControl(
  method="cv",
  number=10,
  summaryFunction=SummaryFunc,
)


# Example of standard glm
our.glm <- train(
  your_formula,
  data=your_data,
  method="glm",
  family=gaussian(link="identity"),
  trControl=ctrl,
  metric="RMSE"
)

# Example of what used to be cv.glmnet
our.glmnet <- train(
  your_feature_matrix,
  your_label_matrix,
  method="glmnet",
  family=gaussian(link="identity"),
  trControl=ctrl,
  metric="RMSE",
  tuneGrid = expand.grid(
    alpha = 1,
    lambda = seq(0.001, 0.1, by=0.001)
  )
)

CM(our.glm)
CM(our.glmnet)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69902307

复制

相似问题

问从cv.glmnet得到混淆矩阵
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从cv.glmnet得到混淆矩阵EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从cv.glmnet得到混淆矩阵
EN