文章/答案/技术大牛

发布

社区首页 >问答首页 >为什么DALEX和tidymodel提供不同的GOF？

问为什么DALEX和tidymodel提供不同的GOF？
EN

Stack Overflow用户

提问于 2022-03-02 19:59:39

回答 1查看 54关注 0票数 1

我想知道为什么DALEX model_performance和collect_metrics不提供同样的准确性。他们是使用不同的措施还是不同的方法？我编译了以下示例代码：

library(tidymodels)
library(parsnip)
library(DALEXtra)

set.seed(1)
x1 <- rbinom(1000, 5, .1)
x2 <- rbinom(1000, 5, .4)
x3 <- rbinom(1000, 5, .9)
x4 <- rbinom(1000, 5, .6)
id <- c(1:1000)
y <- as.factor(rbinom(1000, 5, .5))
df <- tibble(y, x1, x2, x3, x4, id)


# create training and test set
set.seed(20)
split_dat <- initial_split(df, prop = 0.8)
train <- training(split_dat)
test <- testing(split_dat)
# use cross-validation
kfolds <- vfold_cv(df)

# recipe
rec_pca <- recipe(y ~ ., data = train) %>%
  update_role(id, new_role = "id variable") %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors()) %>%
  step_pca(x1, x2, x3, threshold = 0.9, num_comp = 1)

# parsnip engine
boost_model <- boost_tree() %>% 
  set_mode("classification") %>% 
  set_engine("xgboost")

# create wf
boosted_wf <- 
  workflow() %>% 
  add_model(boost_model) %>% 
  add_recipe(rec_pca)

boosted_res <- last_fit(boosted_wf, split_dat)
collect_metrics(boosted_res)

collect_metrics输出为0.31

# A tibble: 2 × 4
  .metric  .estimator .estimate .config             
  <chr>    <chr>          <dbl> <chr>               
1 accuracy multiclass     0.31  Preprocessor1_Model1
2 roc_auc  hand_till      0.512 Preprocessor1_Model1

继续为DALEX模型的解释做准备。

final_boosted <- generics::fit(boosted_wf, df) 

# create an explanation object
explainer_xgb <- DALEXtra::explain_tidymodels(final_boosted, 
                                              data = df[,-1], 
                                              y = df$y) 

perf <- model_performance(explainer_xgb)
perf

现在，这为总体匹配提供了以下输出：

Measures for:  multiclass
micro_F1   : 0.43 
macro_F1   : 0.5743392 
w_macro_F1 : 0.4775901 
accuracy   : 0.43 
w_macro_auc: 0.7064296

注意，使用model_performance和collect_metrics的精度分别为0.43和0.31。有人知道为什么会这样吗？

machine-learning

tidymodels

dalex

回答 1

Stack Overflow用户

发布于 2022-03-03 19:31:32

我相信这是因为采用了不同的重抽样指标/方案。换句话说，正在使用不同的数据来计算性能统计数据。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71328475

复制

相似问题

问为什么DALEX和tidymodel提供不同的GOF？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么DALEX和tidymodel提供不同的GOF？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么DALEX和tidymodel提供不同的GOF？
EN