首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用FeatureImp$new和H2O提取变量重要性时出错

使用FeatureImp$new和H2O提取变量重要性时出错
EN

Stack Overflow用户
提问于 2020-05-09 13:00:05
回答 1查看 203关注 0票数 0

我试图用R中的iml包提取变量重要性,一开始我以为错误是由于我的实现造成的,但当我重现同样的例子时发现情况并非如此,这个例子在here下运行得很好。下面是代码,它相当简单,简单,可重现:

代码语言:javascript
复制
library(rsample)   # data splitting
library(ggplot2)   # allows extension of visualizations
library(dplyr)     # basic data transformation
library(h2o)       # machine learning modeling
library(iml)       # ML interprtation

# initialize h2o session
h2o.no_progress()
h2o.init()

# classification data
df <- rsample::attrition %>% 
  mutate_if(is.ordered, factor, ordered = FALSE) %>%
  mutate(Attrition = recode(Attrition, "Yes" = "1", "No" = "0") %>% factor(levels = c("1", "0")))

# convert to h2o object
df.h2o <- as.h2o(df)

# create train, validation, and test splits
set.seed(123)
splits <- h2o.splitFrame(df.h2o, ratios = c(.7, .15), destination_frames = 
    c("train","valid","test"))
names(splits) <- c("train","valid","test")

# variable names for resonse & features
y <- "Attrition"
x <- setdiff(names(df), y) 

# elastic net model 
glm <- h2o.glm(
  x = x, 
  y = y, 
  training_frame = splits$train,
  validation_frame = splits$valid,
  family = "binomial",
  seed = 123
  )

# 1. create a data frame with just the features
features <- as.data.frame(splits$valid) %>% select(-Attrition)

# 2. Create a vector with the actual responses
response <- as.numeric(as.vector(splits$valid$Attrition))

# 3. Create custom predict function that returns the predicted values as a
#    vector (probability of purchasing in our example)
pred <- function(model, newdata)  {
  results <- as.data.frame(h2o.predict(model, as.h2o(newdata)))
  return(results[[3L]])
}

# create predictor object to pass to explainer functions
predictor.glm <- Predictor$new(
  model = glm, 
  data = features, 
  y = response, 
  predict.fun = pred,
  class = "classification"
  )

imp.glm <- FeatureImp$new(predictor.glm, loss = "mse")

这是我得到的错误:

代码语言:javascript
复制
Error in `[.data.frame`(prediction, , self$class, drop = FALSE): undefined columns 
selected
Traceback:

1. FeatureImp$new(predictor.glm, loss = "mse")

2. .subset2(public_bind_env, "initialize")(...)

3. private$run.prediction(private$sampler$X)

4. self$predictor$predict(data.frame(dataDesign))

5. prediction[, self$class, drop = FALSE]

6. `[.data.frame`(prediction, , self$class, drop = FALSE)

7. stop("undefined columns selected")

我该怎么解决呢?

EN

回答 1

Stack Overflow用户

发布于 2020-05-15 10:17:31

在H2O中,您可以使用varimp()方法获取变量重要性。您可以使用predictor.varimp()

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61692023

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档