有一个二进制分类问题:如何获得游侠模型的变量的Shap贡献?
样本数据:
library(ranger)
library(tidyverse)
# Binary Dataset
df <- iris
df$Target <- if_else(df$Species == "setosa",1,0)
df$Species <- NULL
# Train Ranger Model
model <- ranger(
x = df %>% select(-Target),
y = df %>% pull(Target))我尝试过几个库(DALEX,shapr,fastshap,shapper),但是我没有得到任何解决方案。
我希望得到一些像SHAPforxgboost这样的结果,比如:
shap.plot.summary
shap.values的输出,它是变量发布于 2020-12-01 09:05:18
早安!,根据我所发现的,您可以将ranger()与According ()一起使用,如下所示:
library(fastshap)
library(ranger)
library(tidyverse)
data(iris)
# Binary Dataset
df <- iris
df$Target <- if_else(df$Species == "setosa",1,0)
df$Species <- NULL
x <- df %>% select(-Target)
# Train Ranger Model
model <- ranger(
x = df %>% select(-Target),
y = df %>% pull(Target))
# Prediction wrapper
pfun <- function(object, newdata) {
predict(object, data = newdata)$predictions
}
# Compute fast (approximate) Shapley values using 10 Monte Carlo repetitions
system.time({ # estimate run time
set.seed(5038)
shap <- fastshap::explain(model, X = x, pred_wrapper = pfun, nsim = 10)
})
# Load required packages
library(ggplot2)
theme_set(theme_bw())
# Aggregate Shapley values
shap_imp <- data.frame(
Variable = names(shap),
Importance = apply(shap, MARGIN = 2, FUN = function(x) sum(abs(x)))
)例如,对于可变的重要性,您可以:
# Plot Shap-based variable importance
ggplot(shap_imp, aes(reorder(Variable, Importance), Importance)) +
geom_col() +
coord_flip() +
xlab("") +
ylab("mean(|Shapley value|)")

此外,如果您希望进行单独的预测,则可以进行以下操作:
# Plot individual explanations
expl <- fastshap::explain(model, X = x ,pred_wrapper = pfun, nsim = 10, newdata = x[1L, ])
autoplot(expl, type = "contribution")所有这些信息都在这里找到,其中还有更多的信息:https://bgreenwell.github.io/fastshap/articles/fastshap.html检查链接并解决您的疑问!)

https://stackoverflow.com/questions/65005700
复制相似问题