文章/答案/技术大牛

发布

社区首页 >问答首页 >添加虚拟变量更改系数

问添加虚拟变量更改系数
EN

Stack Overflow用户

提问于 2018-04-20 08:12:27

回答 3查看 857关注 0票数 1

应该为线性模型中的其他解释变量添加一个虚拟变量变化系数吗？我以为它只会改变截距，但是非截距项的系数也发生了变化。

下面是mtcars数据的示例代码(来源于：29b941670a4b42688292b4bb892a660f.html )

data(mtcars)
mtcars$am_text <- as.factor(mtcars$am)
levels(mtcars$am_text) <- c("Automatic", "Manual")


fit1 <- lm(mpg ~ am_text + wt, data = mtcars)
summary(fit1)

Call:
lm(formula = mpg ~ am_text + wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5295 -2.3619 -0.1317  1.4025  6.8782 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   37.32155    3.05464  12.218 5.84e-13 ***
am_textManual -0.02362    1.54565  -0.015    0.988    
wt            -5.35281    0.78824  -6.791 1.87e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.098 on 29 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7358 
F-statistic: 44.17 on 2 and 29 DF,  p-value: 1.579e-09

现在运行一个具有子集数据的线性模型：

# Here is without dummy variable, but now with subset data
fit2 <- lm(mpg ~ wt, data = mtcars[mtcars$am_text == "Automatic",])
summary(fit2)

Call:
lm(formula = mpg ~ wt, data = mtcars[mtcars$am_text == "Automatic",])

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6004 -1.5227 -0.2168  1.4816  5.0610 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  31.4161     2.9467  10.661 6.01e-09 ***
wt           -3.7859     0.7666  -4.939 0.000125 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.528 on 17 degrees of freedom
Multiple R-squared:  0.5893,    Adjusted R-squared:  0.5651 
F-statistic: 24.39 on 1 and 17 DF,  p-value: 0.0001246

linear-regression

dummy-variable

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-04-20 08:58:32

实际上，问题是fit1中的斜率系数实际上是自动车辆和手动车辆的总和，尽管每个因素都有自己的拦截。如果包含am_text和wt (am_text:wt)之间的交互项，那么您可以更好地与只使用自动汽车(fit2)的模型进行比较。

fit3 <- lm(mpg ~ am_text + wt + am_text:wt, data = mtcars)
summary(fit3)

# Call:
# lm(formula = mpg ~ am_text * wt, data = mtcars)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.6004 -1.5446 -0.5325  0.9012  6.0909 
# 
# Coefficients:
#                  Estimate Std. Error t value Pr(>|t|)    
# (Intercept)       31.4161     3.0201  10.402 4.00e-11 ***
# am_textManual     14.8784     4.2640   3.489  0.00162 ** 
# wt                -3.7859     0.7856  -4.819 4.55e-05 ***
# am_textManual:wt  -5.2984     1.4447  -3.667  0.00102 ** 
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 2.591 on 28 degrees of freedom
# Multiple R-squared:  0.833,   Adjusted R-squared:  0.8151 
# F-statistic: 46.57 on 3 and 28 DF,  p-value: 5.209e-11

现在请注意，fit3系数本身包含自动车辆的截距和斜率，这与fit2系数相匹配。

coef(fit2) # fit only to automatic
# (Intercept)          wt 
#   31.416055   -3.785908 

coef(fit3)
# (Intercept)    am_textManual               wt am_textManual:wt 
#   31.416055        14.878423        -3.785908        -5.298360

票数 1

Stack Overflow用户

发布于 2018-04-20 08:30:47

在lm中，当使用普通最小二乘(OLS)拟合模型时，最小平方残差之和是模型参数的函数。通常，在OLS中，对参数不存在约束。

因此，添加参数通常会导致不同的参数估计，因为OLS估计值只是对应于将SSR最小化的那些参数值。如果您添加一个虚拟变量(或任何其他变量)，lm将简单地返回那些导致最低SSR值的参数估计。在最小化过程中，所有参数值都可以自由变化。

关于细节，请看一看例如OLS上的维基百科条目或任何统计教科书。

票数 1

Stack Overflow用户

发布于 2018-04-20 08:40:11

是的，如果你在你的模型中添加变量，你应该期望你的系数发生变化。记住，任何变量的系数总是相对于模型中的其他变量。

如果您有Y= aX1 + bX2 +cX3 + E，并且在模型中添加了一个X4，那么您应该期望a、b和c会发生变化(除非X4对模型没有任何影响)。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49937069

复制

相似问题

问添加虚拟变量更改系数
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问添加虚拟变量更改系数EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问添加虚拟变量更改系数
EN