我是新来的,不知道如何修正我得到的错误。
以下是我的数据摘要:
> summary(data)
Metro MrktRgn MedAge numHmSales
Abilene : 1 Austin-Waco-Hill Country : 6 20-25: 3 Min. : 302
Amarillo : 1 Far West Texas : 1 25-30: 6 1st Qu.: 1057
Arlington: 1 Gulf Coast - Brazos Bottom:10 30-35:28 Median : 2098
Austin : 1 Northeast Texas :14 35-40: 6 Mean : 7278
Bay Area : 1 Panhandle and South Plains: 5 45-50: 2 3rd Qu.: 5086
Beaumont : 1 South Texas : 7 50-55: 1 Max. :83174
(Other) :40 West Texas : 3
AvgSlPr totNumLs MedHHInc Pop
Min. :123833 Min. : 1257 Min. :37300 Min. : 2899
1st Qu.:149117 1st Qu.: 6028 1st Qu.:53100 1st Qu.: 56876
Median :171667 Median : 11106 Median :57000 Median : 126482
Mean :188637 Mean : 24302 Mean :60478 Mean : 296529
3rd Qu.:215175 3rd Qu.: 25472 3rd Qu.:66200 3rd Qu.: 299321
Max. :303475 Max. :224230 Max. :99205 Max. :2196000
NA's :1 然后,以AvSlPr作为y变量,其他变量作为x变量,建立了一个模型。
> model1 = lm(AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + totNumLs + MedHHInc + Pop)但是当我对模型做一个总结时,我得到了性病的NA。错误,t值和t p值.
> summary(model1)
Call:
lm(formula = AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales +
totNumLs + MedHHInc + Pop)
Residuals:
ALL 45 residuals are 0: no residual degrees of freedom!
Coefficients: (15 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 143175 NA NA NA
MetroAmarillo 24925 NA NA NA
MetroArlington 35258 NA NA NA
MetroAustin 160300 NA NA NA
MetroBay Area 68642 NA NA NA
MetroBeaumont 5942 NA NA NA
...
MrktRgnWest Texas NA NA NA NA
MedAge25-30 NA NA NA NA
MedAge30-35 NA NA NA NA
MedAge35-40 NA NA NA NA
MedAge45-50 NA NA NA NA
MedAge50-55 NA NA NA NA
numHmSales NA NA NA NA
totNumLs NA NA NA NA
MedHHInc NA NA NA NA
Pop NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 44 and 0 DF, p-value: NA有人知道出什么问题了吗?我怎么能解决这个问题?另外,我不应该使用虚拟变量。
发布于 2017-11-20 07:17:05
您的Metro变量总是引用每个因素级别的一行。你至少需要两个点才能符合一条线。让我举一个例子来说明:
dat = data.frame(AvgSlPr=runif(4), Metro = factor(LETTERS[1:4]), MrktRgn = runif(4))
model1 = lm(AvgSlPr ~ Metro + MrktRgn, data = dat)
summary(model1)
#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)
#Residuals:
#ALL 4 residuals are 0: no residual degrees of freedom!
#Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.33801 NA NA NA
#MetroB 0.47350 NA NA NA
#MetroC -0.04118 NA NA NA
#MetroD 0.20047 NA NA NA
#MrktRgn NA NA NA NA
#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared: 1, Adjusted R-squared: NaN
#F-statistic: NaN on 3 and 0 DF, p-value: NA但是,如果我们添加更多的数据,以便至少有一些因子级别有多行数据,则可以计算线性模型:
dat = rbind(dat, data.frame(AvgSlPr=2:4, Metro=factor(LETTERS[2:4]), MrktRgn = 3:5))
model2 = lm(AvgSlPr ~ Metro + MrktRgn, data=dat)
summary(model2)
#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)
#Residuals:
# 1 2 3 4 5 6 7
# 9.021e-17 2.643e-01 7.304e-03 -1.498e-01 -2.643e-01 -7.304e-03 1.498e-01
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.24279 0.30406 0.798 0.50834
#MetroB -0.10207 0.38858 -0.263 0.81739
#MetroC -0.06696 0.39471 -0.170 0.88090
#MetroD 0.06804 0.41243 0.165 0.88413
#MrktRgn 0.70787 0.06747 10.491 0.00896 **
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 0.3039 on 2 degrees of freedom
#Multiple R-squared: 0.9857, Adjusted R-squared: 0.9571
#F-statistic: 34.45 on 4 and 2 DF, p-value: 0.02841用于拟合模型的数据需要重新考虑。分析的目的是什么?实现这一目标需要哪些数据?
https://stackoverflow.com/questions/47386290
复制相似问题