文章/答案/技术大牛

发布

社区首页 >问答首页 >将所有因素水平与总体均值进行比较:我能在线性模型中调整对比以显示所有的水平吗？

问将所有因素水平与总体均值进行比较:我能在线性模型中调整对比以显示所有的水平吗？
EN

Stack Overflow用户

提问于 2022-06-30 18:06:06

回答 2查看 295关注 0票数 1

我试图在线性模型上调整对比度编码，在这个模型中，我想知道每个因子的水平是否与大平均值有很大的不同。

假设因子有"A“、"B”和"C“级别。最常见的控制-治疗对比明显设置为"A“作为参考水平，并将"B”和"C“与之进行比较。这不是我想要的，因为"A“级没有出现在模型摘要中。

偏差编码似乎也没有给我想要的东西，因为它将"C“级的对比度矩阵设置为[-1,-1,-1]，而现在这个级别没有出现在模型摘要中。

set.seed(1)
y <- rnorm(6, 0, 1)
x <- factor(rep(LETTERS[1:3], each = 2))
fit <- lm(y ~ x, contrasts = list(x = contr.sum))
summary(fit)

此外，报告的级别名称已从"A“、"B”改为"1“和"2”。

Call:
lm(formula = y ~ x, contrasts = list(x = contr.sum))

Residuals:
     1      2      3      4      5      6 
-0.405  0.405 -1.215  1.215  0.575 -0.575 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.02902    0.46809  -0.062    0.954
x1          -0.19239    0.66198  -0.291    0.790
x2           0.40885    0.66198   0.618    0.581

Residual standard error: 1.147 on 3 degrees of freedom
Multiple R-squared:  0.1129,    Adjusted R-squared:  -0.4785 
F-statistic: 0.1909 on 2 and 3 DF,  p-value: 0.8355

我是不是遗漏了什么？我是否应该添加一个等于大平均值的虚拟变量，以便我可以使用它作为引用级别？

去年，我看到了一个类似的问题(但可能要求更高)，但还没有解决方案：models with 'differences from mean' for all coefficients on categorical variables; get 'contrast coding' to do it?。我已经悬赏了它，希望引起人们的注意。

linear-regression

categorical-data

regression

回答 2

Stack Overflow用户

发布于 2022-06-30 22:32:52

这只是一个初步的功能化版本@Zheyuan's惊人的答案。正如@浙原也提到的，这个答案也有局限性，包括复杂的模型和对比，等等。我认为如果这个因素是有序的，这个函数就不起作用。

乳头数据：

set.seed(1)
test.df <- data.frame(y = rnorm(10, 0, 1),
                     x = factor(rep(LETTERS[1:5], each = 2)))
test.fit <- lm(y ~ x, contrasts = list(x=contr.sum), data= test.df)

返回级别统计数据的函数：

# Output deviation coded factor matrix, used within `DevContrStats()`
ContrSumMat <- function (fctr, sparse = FALSE) {
  if (!is.factor(fctr)) stop("'fctr' is not a factor variable!")
  N <- nlevels(fctr)
  Cmat <- contr.sum(N, sparse = sparse)
  dimnames(Cmat) <- list(levels(fctr), seq_len(N - 1))
  Cmat
}

## `dat` is the original data
## `fit.model` is the linear model created from the data
## `fctr_col` is the colum you'd like to get all factor levels from
DevContrStats <- function(dat, fit.model, fctr_col) {
  
  fctr.mtx <- ContrSumMat(factor(dat[, fctr_col]))
  
  ## coefficients After Contrasts
  coef_a_contr <- coef(fit.model)[-1]

  ## coefficients Before Contrasts
  ## coef_bc tells how each factor level deviates from the grand mean.
  coef_b_contr <- (fctr.mtx %*% coef_a_contr)[, 1]

  ## Covariance matrix after contrasts:
  var_a_contr <- vcov(fit.model)[-1,-1]

  ## Transform covariance matrix (after contrasts) to the one before contrasts:
  ## The diagonal of var_bc gives the estimated variance of factor-level deviations.
  var_b_contr <- fctr.mtx %*% var_a_contr %*% t(fctr.mtx)

  ## standard error of point estimate `coef_bc`
  std.err_b_contr <- sqrt(diag(var_b_contr))

  ## t-statistics (Null Hypothesis: coef_bc = 0)
  t.stats_b_contr <- coef_b_contr / std.err_b_contr

  ## p-values of the t-statistics
  p.value_b_contr <- 2 * pt(abs(t.stats_b_contr),
                            df = fit.model$df.residual,
                            lower.tail = FALSE)

  ## construct a coefficient table that mimics `coef(summary(fit))`
  stats.tab_b_contr <- cbind("Estimate" = coef_b_contr,
                        "Std. Error" = std.err_b_contr,
                        "t value" = t.stats_b_contr,
                        "Pr(>|t|)" = p.value_b_contr)

  ## extract statistics of the intercept, which = grand mean
  intercept.stats <- coef(summary(fit.model))[1, , drop = FALSE]

  ## augment the coefficient table with the intercept stats
  stats.tab <- rbind(intercept.stats, stats.tab_b_contr)

  ## print stats table with significance stars
  printCoefmat(stats.tab)
}


17:20:20> DevContrStats(test.df, test.fit, 'x')
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.164183   0.245068  0.6699  0.53258  
A            0.448694   0.490137  0.9154  0.40195  
B           -0.028986   0.490137 -0.0591  0.95513  
C            0.786629   0.490137  1.6049  0.16942  
D           -1.582153   0.490137 -3.2280  0.02326 *
E            0.375816   0.490137  0.7668  0.47785  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

票数 1

Stack Overflow用户

发布于 2022-06-30 18:15:27

不如做一系列的t检验，每个测试将子集的平均值与大平均值进行比较？

lapply(levels(x), \(i) t.test(y,y[x==i]))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72820236

复制

相似问题

问将所有因素水平与总体均值进行比较:我能在线性模型中调整对比以显示所有的水平吗？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将所有因素水平与总体均值进行比较:我能在线性模型中调整对比以显示所有的水平吗？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将所有因素水平与总体均值进行比较:我能在线性模型中调整对比以显示所有的水平吗？
EN