对于固定效果模型,我计划从Stata的areg切换到Python的linearmodels.panel.PanelOLS。
但结果是不同的。在Stata中,我得到了R-squared = 0.6047,在Python中,我得到了R-squared = 0.1454。
为什么我得到的R平方与下面的命令如此不同?
Stata命令和结果:
use ./linearmodels_datasets_wage_panel.dta, clear
areg lwage expersq union married hours, vce(cluster nr) absorb(nr)Linear regression, absorbing indicators Number of obs = 4,360
Absorbed variable: nr No. of categories = 545
F(4, 544) = 84.67
Prob > F = 0.0000
R-squared = 0.6047
Adj R-squared = 0.5478
Root MSE = 0.3582
(Std. err. adjusted for 545 clusters in nr)
------------------------------------------------------------------------------
| Robust
lwage | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
expersq | .0039509 .0002554 15.47 0.000 .0034492 .0044526
union | .0784442 .0252621 3.11 0.002 .028821 .1280674
married | .1146543 .0234954 4.88 0.000 .0685014 .1608072
hours | -.0000846 .0000238 -3.56 0.000 -.0001313 -.0000379
_cons | 1.565825 .0531868 29.44 0.000 1.461348 1.670302
------------------------------------------------------------------------------Python命令和结果:
from linearmodels.datasets import wage_panel
from linearmodels.panel import PanelOLS
data = wage_panel.load()
mod_entity = PanelOLS.from_formula(
"lwage ~ 1 + expersq + union + married + hours + EntityEffects",
data=data.set_index(["nr", "year"]),
)
result_entity = mod_entity.fit(
cov_type='clustered',
cluster_entity=True,
)
print(result_entity) PanelOLS Estimation Summary
================================================================================
Dep. Variable: lwage R-squared: 0.1454
Estimator: PanelOLS R-squared (Between): -0.0844
No. Observations: 4360 R-squared (Within): 0.1454
Date: Wed, Feb 02 2022 R-squared (Overall): 0.0219
Time: 12:23:24 Log-likelihood -1416.4
Cov. Estimator: Clustered
F-statistic: 162.14
Entities: 545 P-value 0.0000
Avg Obs: 8.0000 Distribution: F(4,3811)
Min Obs: 8.0000
Max Obs: 8.0000 F-statistic (robust): 96.915
P-value 0.0000
Time periods: 8 Distribution: F(4,3811)
Avg Obs: 545.00
Min Obs: 545.00
Max Obs: 545.00
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
Intercept 1.5658 0.0497 31.497 0.0000 1.4684 1.6633
expersq 0.0040 0.0002 16.550 0.0000 0.0035 0.0044
hours -8.46e-05 2.22e-05 -3.8101 0.0001 -0.0001 -4.107e-05
married 0.1147 0.0220 5.2207 0.0000 0.0716 0.1577
union 0.0784 0.0236 3.3221 0.0009 0.0321 0.1247
==============================================================================
F-test for Poolability: 9.4833
P-value: 0.0000
Distribution: F(544,3811)
Included effects: Entity发布于 2022-10-27 21:30:32
天啊。你好吗?
您正在尝试运行吸收回归(.areg)。具体来说,你试图运行一个“线性回归吸收一个分类因素”。要做到这一点,只需运行以下模型linearmodels.iv.absorbing.AbsorbingLS(endog_variable, exog_variables, categorical_variable_absorb)
见下面的例子:
import pandas as pd
import statsmodels as sm
from linearmodels.iv import absorbing
dta = pd.read_csv('http://www.math.smith.edu/~bbaumer/mth247/labs/airline.csv')
dta.rename(columns={'I': 'airline',
'T': 'year',
'Q': 'output',
'C': 'cost',
'PF': 'fuel',
'LF ': 'load'}, inplace=True)接下来,将吸收变量转换为分类变量(在本例中,我将使用airline变量):
cats = pd.DataFrame({'airline': pd.Categorical(dta['airline'])})然后,运行模型:
exog_variables = ['output', 'fuel', 'load']
endog_variable = ['cost']
exog = sm.tools.tools.add_constant(dta[exog_variables])
endog = dta[endog_variable]
model = absorbing.AbsorbingLS(endog, exog, absorb=cats, drop_absorbed=True)
model_res = model.fit(cov_type='unadjusted', debiased=True)
print(model_res.summary)下面是相同模型在python和stata中的结果(使用命令.areg cost output fuel load, absorb(airline))
Python:
Absorbing LS Estimation Summary
==================================================================================
Dep. Variable: cost R-squared: 0.9974
Estimator: Absorbing LS Adj. R-squared: 0.9972
No. Observations: 90 F-statistic: 3827.4
Date: Thu, Oct 27 2022 P-value (F-stat): 0.0000
Time: 20:58:04 Distribution: F(3,81)
Cov. Estimator: unadjusted R-squared (No Effects): 0.9926
Varaibles Absorbed: 5.0000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 9.7135 0.2229 43.585 0.0000 9.2701 10.157
output 0.9193 0.0290 31.691 0.0000 0.8616 0.9770
fuel 0.4175 0.0148 28.303 0.0000 0.3881 0.4468
load -1.0704 0.1957 -5.4685 0.0000 -1.4599 -0.6809
==============================================================================斯塔塔:
Linear regression, absorbing indicators Number of obs = 90
F( 3, 81) = 3604.80
Prob > F = 0.0000
R-squared = 0.9974
Adj R-squared = 0.9972
Root MSE = .06011
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.713528 .229641 42.30 0.000 9.256614 10.17044
-------------+----------------------------------------------------------------
airline | F(5, 81) = 57.732 0.000 (6 categories) https://stackoverflow.com/questions/70954911
复制相似问题