文章/答案/技术大牛

发布

社区首页 >问答首页 >置信区间:使用mob()树(partykit包)和logistic()模型

问置信区间:使用mob()树(partykit包)和logistic()模型
EN

Stack Overflow用户

提问于 2021-07-31 10:31:51

回答 1查看 114关注 0票数 1

如何为暴徒对象的终端节点中的模型计算CI？或者，我是否可以提取每个终端节点的模型，以便在我的情况下，它是一个logistic回归模型并计算每个节点的CI？

谢谢

party

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-08-01 11:28:33

基础知识

使用refit.modelparty()，您可以重构与modelparty对象的节点关联的模型，这些模型由mob()或glmtree()返回。然后，您可以应用通常的函数来提取您感兴趣的信息。

注意事项

在树学习之后的推理不再是“诚实的”了，也就是说，在您的例子中，置信区间的覆盖范围可能不会在名义水平上。要获得诚实的结果，您必须拆分数据并在一个子样本上学习树，然后在另一个子样本上的终端节点中重新构建模型。

，如果您使用glmtree()，那么每个节点中的模型都不是完全的glm对象，以使树的拟合更有效。因此，来自confint()方法的剖面似然置信区间并不是开箱即用的。您可以使用Wald置信区间(例如，在coefci()中可以从lmtest获得)，也可以修改模型(类似于注释1.)。

插图

作为一个起点，让我们首先考虑一个基于glmtree()和coefci()的简单示例

## Pima Indians diabetes data
data("PimaIndiansDiabetes", package = "mlbench")
 
## recursive partitioning of a logistic regression model
library("partykit")
pid_tree <- glmtree(diabetes ~ glucose | pregnant +
  pressure + triceps + insulin + mass + pedigree + age,
  data = PimaIndiansDiabetes, family = binomial)

## extract fitted models in terminal nodes
pid_glm <- refit.modelparty(pid_tree,
  node = nodeids(pid_tree, terminal = TRUE))

## compute Wald confidence intervals
library("lmtest")
lapply(pid_glm, coefci)
## $`2`
##                    2.5 %      97.5 %
## (Intercept) -13.36226424 -6.54075507
## glucose       0.03496439  0.08245134
## 
## $`4`
##                   2.5 %      97.5 %
## (Intercept) -8.27393574 -5.13723535
## glucose      0.03466962  0.05900533
## 
## $`5`
##                   2.5 %      97.5 %
## (Intercept) -3.84548740 -1.69642032
## glucose      0.01529946  0.03177217

为了诚实起见，我们可以这样做：

## id for sample splitting
n <- nrow(PimaIndiansDiabetes)
id <- sample(1:n, round(n/2))

## estimate tree on learning sample
pid_tree <- glmtree(diabetes ~ glucose | pregnant +
  pressure + triceps + insulin + mass + pedigree + age,
  data = PimaIndiansDiabetes[id,], family = binomial)

## out-of-sample prediction
pid_new <- PimaIndiansDiabetes[-id,]
pid_new$node <- predict(pid_tree, newdata = pid_new, type = "node")

## fit separate models on each subset, splitted by predicted node
pid_glm <- lapply(
  split(pid_new, pid_new$node),
  function(d) glm(diabetes ~ glucose, data = d, family = binomial)
)

## obtain profile likelihood confidence intervals
lapply(pid_glm, confint)
## Waiting for profiling to be done...
## Waiting for profiling to be done...
## Waiting for profiling to be done...
## $`2`
##                   2.5 %     97.5 %
## (Intercept) -8.25379802 -4.5031005
## glucose      0.02743191  0.0569824
## 
## $`4`
##                    2.5 %     97.5 %
## (Intercept) -25.32777607 -4.5078931
## glucose       0.02090339  0.1617026
## 
## $`5`
##                   2.5 %      97.5 %
## (Intercept) -6.38422374 -2.66969443
## glucose      0.02222873  0.05060984

在基于逻辑GLM这样的线性预测器的模型中，也可以对表示整个树的单个模型进行拟合。您只需包含所有回归者与节点指示符的交互。要获得相同的参数和置信区间，需要使用嵌套编码(/)，而不是默认的交互编码(*)：

pid_glm2 <- glm(diabetes ~ 0 + factor(node)/glucose,
  data = pid_new, family = binomial)
confint(pid_glm2)
## Waiting for profiling to be done...
##                              2.5 %      97.5 %
## factor(node)2          -8.25379802 -4.50310050
## factor(node)4         -25.32777607 -4.50789313
## factor(node)5          -6.38422332 -2.66969490
## factor(node)2:glucose   0.02743191  0.05698240
## factor(node)4:glucose   0.02090339  0.16170262
## factor(node)5:glucose   0.02222874  0.05060983

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68601263

复制

相似问题

问置信区间:使用mob()树(partykit包)和logistic()模型
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问置信区间:使用mob()树(partykit包)和logistic()模型EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问置信区间:使用mob()树(partykit包)和logistic()模型
EN