文章/答案/技术大牛

发布

社区首页 >问答首页 >如何将链表转换为差异表结构的数据帧

问如何将链表转换为差异表结构的数据帧
EN

Stack Overflow用户

提问于 2021-08-31 10:53:42

回答 2查看 75关注 0票数 0

我应用机器学习算法和插入程序包(caretlist)根据多个变量(例如年龄、性别、吸烟者等)预测一组患者的死亡：

algorithmList <- c('rf', 'pls','parRF','nnet', 'xgbTree','avNNet',
                    'gbm','monmlp','nb','glm','pcaNNet','lda','C5.0',
                    'svmLinear2','knn')
 
 set.seed(100)
 list_models <- caretList(Death_event~., data=na.exclude(dataset), methodList = algorithmList, metric="ROC", trControl=control)

然后，我使用varImp命令从该算法列表中提取变量重要性，这将生成一个列表

importance <- lapply(list_models, varImp)

输出：

Importance structure

> str(importance)
List of 15
 $ rf        :List of 3
  ..$ importance:'data.frame':  11 obs. of  1 variable:
  .. ..$ Overall: num [1:11] 53.8 4.1 100 7.44 0 ...
  ..$ model     : chr "rf"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ pls       :List of 3
  ..$ importance:'data.frame':  11 obs. of  1 variable:
  .. ..$ Overall: num [1:11] 15.91 4.88 100 18.95 0 ...
  ..$ model     : chr "pls"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ parRF     :List of 3
  ..$ importance:'data.frame':  11 obs. of  1 variable:
  .. ..$ Overall: num [1:11] 51.26 3.74 100 7.66 0 ...
  ..$ model     : chr "parRF"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ nnet      :List of 3
  ..$ importance:'data.frame':  11 obs. of  1 variable:
  .. ..$ Overall: num [1:11] 14 41.9 56.4 62.1 31.2 ...
  ..$ model     : chr "nnet"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ xgbTree   :List of 3
  ..$ importance:'data.frame':  11 obs. of  1 variable:
  .. ..$ Overall: num [1:11] 100 48.1 40.2 21.5 21.1 ...
  ..$ model     : chr "xgbTree"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ avNNet    :List of 3
  ..$ importance:'data.frame':  11 obs. of  2 variables:
  .. ..$ No_death: num [1:11] 14.37 14.36 100 45.4 9.04 ...
  .. ..$ Death   : num [1:11] 14.37 14.36 100 45.4 9.04 ...
  ..$ model     : chr "ROC curve"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ gbm       :List of 3
  ..$ importance:'data.frame':  11 obs. of  1 variable:
  .. ..$ Overall: num [1:11] 13.543 0.749 100 6.743 0 ...
  ..$ model     : chr "gbm"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ monmlp    :List of 3
  ..$ importance:'data.frame':  11 obs. of  2 variables:
  .. ..$ No_death: num [1:11] 14.37 14.36 100 45.4 9.04 ...
  .. ..$ Death   : num [1:11] 14.37 14.36 100 45.4 9.04 ...
  ..$ model     : chr "ROC curve"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ nb        :List of 3
  ..$ importance:'data.frame':  11 obs. of  2 variables:
  .. ..$ No_death: num [1:11] 14.37 14.36 100 45.4 9.04 ...
  .. ..$ Death   : num [1:11] 14.37 14.36 100 45.4 9.04 ...
  ..$ model     : chr "ROC curve"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ glm       :List of 3
  ..$ importance:'data.frame':  11 obs. of  1 variable:
  .. ..$ Overall: num [1:11] 13 27.3 100 50.5 11.6 ...
  ..$ model     : chr "glm"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ pcaNNet   :List of 3
  ..$ importance:'data.frame':  11 obs. of  2 variables:
  .. ..$ No_death: num [1:11] 14.37 14.36 100 45.4 9.04 ...
  .. ..$ Death   : num [1:11] 14.37 14.36 100 45.4 9.04 ...
  ..$ model     : chr "ROC curve"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ lda       :List of 3
  ..$ importance:'data.frame':  11 obs. of  2 variables:
  .. ..$ No_death: num [1:11] 14.37 14.36 100 45.4 9.04 ...
  .. ..$ Death   : num [1:11] 14.37 14.36 100 45.4 9.04 ...
  ..$ model     : chr "ROC curve"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ C5.0      :List of 3
  ..$ importance:'data.frame':  11 obs. of  1 variable:
  .. ..$ Overall: num [1:11] 100 100 100 100 100 ...
  ..$ model     : chr "C5.0"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ svmLinear2:List of 3
  ..$ importance:'data.frame':  11 obs. of  2 variables:
  .. ..$ No_death: num [1:11] 14.37 14.36 100 45.4 9.04 ...
  .. ..$ Death   : num [1:11] 14.37 14.36 100 45.4 9.04 ...
  ..$ model     : chr "ROC curve"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"
 $ knn       :List of 3
  ..$ importance:'data.frame':  11 obs. of  2 variables:
  .. ..$ No_death: num [1:11] 14.37 14.36 100 45.4 9.04 ...
  .. ..$ Death   : num [1:11] 14.37 14.36 100 45.4 9.04 ...
  ..$ model     : chr "ROC curve"
  ..$ calledFrom: chr "varImp"
  ..- attr(*, "class")= chr "varImp.train"

然后，我面临着第一个问题

在算法的一半中，用不同的方法(ROC方法)提取重要性。这不会改变任何解释，但在一些算法中，标题是“重要性”，而在另一些算法中，标题是“总体”，但它们是完全相同的信息：

$gbm
gbm variable importance

                                        Overall
Age_at_CT                              100.0000
Muscle_HU                               48.6376
history_of_CV_yes_noat_leasT_1CV_event  38.1153
VAT_Area_cm2                            19.3376
Liver_HU_Median                         17.7983
SAT_Area_cm2                            17.3343
L3_SMI_cm2m2                            15.5910
BMI                                     13.5431
Tobacco_yes_noSmoker                     6.7431
SexMale                                  0.7494
T2D_at_CTDiabetes                        0.0000

$monmlp
ROC curve variable importance

                     Importance
Age_at_CT               100.000
Muscle_HU                87.085
history_of_CV_yes_no     61.254
VAT_Area_cm2             49.174
Liver_HU_Median          47.712
Tobacco_yes_no           45.404
BMI                      14.372
Sex                      14.363
T2D_at_CT                 9.035
L3_SMI_cm2m2              7.453
SAT_Area_cm2              0.000

您可能已经在结构中注意到，对于那些使用ROC方法提取重要性的算法，有两个子列(death和no_death)，但这两个子列中的数字完全相同。

我尝试创建的是一个简单的tibble/data帧，其中：

第一列=算法的名称(这里是列表的名称，例如gbm或monmlp)，第二列=变量的名称(例如Age_at_CT、muscle_HU等)。第三列=重要性数字(在某些算法中=“重要性”，在其他算法中=“总体”)

我发现的唯一解决办法是将列表和c/c打印到每个算法的excel表格算法中(yeah...that很糟糕)。

list

r-caret

回答 2

Stack Overflow用户

发布于 2021-08-31 11:34:18

您可以执行以下操作：

algoNames <- names(importance)
#extract the importance elements (data.frames) of the lists
importanceDfList <- lapply(importance,"[[","importance") 
#variable names are the the rownames of those data.frames
variableNameList <- lapply(importanceDfList,rownames) 
#get the importance values aout of the data.frames, respecting different namings of the columns
#if no column matches, we will discard the element 
#(here you have to think about how to deal with importance-data.frames with two columns)
possibleImportanceDataframeNames <- c("Overall","Importance")
importanceValueList <- lapply(importanceDfList, function(importanceDf) {
  matchingImportanceName <- which(possibleImportanceDataframeNames %in% names(importanceDf))
  if(!length(matchingImportanceName)) return(NULL)
  importanceDf[[matchingImportanceName]]
})

replicationTimes <- sapply(importanceValueList,length)

resultDf <- data.frame(
  Algorithm = rep(algoNames, times = replicationTimes),
  Variable = unlist(variableNameList[replicationTimes > 0]),
  Importance = unlist(importanceValueList[replicationTimes > 0]), 
  stringsAsFactors = FALSE
)

票数 0

Stack Overflow用户

发布于 2021-08-31 19:15:04

我根据你的代码找到了解决方案！

我只是将向量名称更改为两列名称中的一列

algoNames <- names(importance)
#extract the importance elements (data.frames) of the lists
importanceDfList <- lapply(importance,"[[","importance") 
#variable names are the the rownames of those data.frames
variableNameList <- lapply(importanceDfList,rownames) 
#get the importance values aout of the data.frames, respecting different namings of the columns
#if no column matches, we will discard the element 
#(here you have to think about how to deal with importance-data.frames with two columns)
possibleImportanceDataframeNames <- c("Overall","Importance") ## HERE: I changed the "importance" by one of the two column names

importanceValueList <- lapply(importanceDfList, function(importanceDf) {
  matchingImportanceName <- which(possibleImportanceDataframeNames %in% names(importanceDf))
  if(!length(matchingImportanceName)) return(NULL)
  importanceDf[[matchingImportanceName]]
})

replicationTimes <- sapply(importanceValueList,length)

resultDf <- data.frame(
  Algorithm = rep(algoNames, times = replicationTimes),
  Variable = unlist(variableNameList[replicationTimes > 0]),
  Importance = unlist(importanceValueList[replicationTimes > 0]), 
  stringsAsFactors = FALSE
)

再次感谢您的投入

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68997398

复制

相似问题

问如何将链表转换为差异表结构的数据帧
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将链表转换为差异表结构的数据帧EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将链表转换为差异表结构的数据帧
EN