我试着用线性回归方法预测房价。我从一个房地产网站收集真实的数据。我有一些特征和两个数值,其中价格是要猜测的目标变量。我有大约3000个数据,其中第一列是省,面积字段为平方米的房子,跟随多少沙龙+房间,和其他功能如0或1。我试图得到一个公式(系数)。但是,我使用的Orange显示了非常奇怪的猜测。有没有错误的一步或遗漏的一步(S)?猜测可以改进吗?顺便说一下,通过Box链接可以下载数据集。


发布于 2019-05-22 13:57:58
有些事情要注意:
你的结果只不过是不合身的结果。看看R^2和平均绝对误差。我认为在OLS环境下,几乎没有任何空间来进一步提高适合度。
我所能做的最好不过是258434 / R2=0.58。因此,在你的预测中,你平均失败了258434台。
Call:
lm(formula = Fiyat ~ poly(m2, 10, raw = T) + ., data = dat)
Residuals:
Min 1Q Median 3Q Max
-6864176 -190364 301 131575 20452070
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.470e+07 5.729e+06 4.311 1.68e-05 ***
poly(m2, 10, raw = T)1 -1.848e+06 3.749e+05 -4.929 8.76e-07 ***
poly(m2, 10, raw = T)2 5.701e+04 1.015e+04 5.618 2.11e-08 ***
poly(m2, 10, raw = T)3 -9.411e+02 1.513e+02 -6.222 5.63e-10 ***
poly(m2, 10, raw = T)4 9.326e+00 1.380e+00 6.757 1.70e-11 ***
poly(m2, 10, raw = T)5 -5.850e-02 8.090e-03 -7.231 6.12e-13 ***
poly(m2, 10, raw = T)6 2.371e-04 3.100e-05 7.648 2.77e-14 ***
poly(m2, 10, raw = T)7 -6.173e-07 7.706e-08 -8.011 1.65e-15 ***
poly(m2, 10, raw = T)8 9.943e-10 1.195e-10 8.322 < 2e-16 ***
poly(m2, 10, raw = T)9 -8.994e-13 1.048e-13 -8.584 < 2e-16 ***
poly(m2, 10, raw = T)10 3.488e-16 3.964e-17 8.799 < 2e-16 ***
IlceAtasehir -1.855e+05 8.994e+04 -2.062 0.039275 *
IlceBeykoz 4.925e+04 8.370e+04 0.588 0.556325
IlceÇekmeköy -3.554e+05 9.068e+04 -3.919 9.10e-05 ***
IlceKadiköy 2.803e+05 8.855e+04 3.166 0.001564 **
IlceKartal -3.790e+05 8.705e+04 -4.354 1.39e-05 ***
IlceMaltepe -3.065e+05 8.814e+04 -3.478 0.000514 ***
IlcePendik -3.721e+05 9.133e+04 -4.074 4.75e-05 ***
IlceSancaktepe -4.431e+05 9.077e+04 -4.882 1.11e-06 ***
IlceSile -4.746e+05 8.422e+04 -5.636 1.91e-08 ***
IlceSultanbeyli -4.081e+05 9.168e+04 -4.451 8.87e-06 ***
IlceTuzla -3.956e+05 8.975e+04 -4.408 1.08e-05 ***
IlceÜmraniye -2.777e+05 9.185e+04 -3.023 0.002524 **
IlceÜsküdar 6.886e+04 8.704e+04 0.791 0.428931
m2 NA NA NA NA
`Oda Salon`1+1 -1.786e+05 2.131e+05 -0.838 0.401936
`Oda Salon`1+16 1.651e+05 7.199e+05 0.229 0.818646
`Oda Salon`1+2 -6.592e+05 5.347e+05 -1.233 0.217670
`Oda Salon`1+21 -2.802e+05 7.203e+05 -0.389 0.697349
`Oda Salon`1+3 -2.865e+05 4.514e+05 -0.635 0.525770
`Oda Salon`1+5 -3.472e+05 3.536e+05 -0.982 0.326228
`Oda Salon`2+0 -1.754e+05 4.071e+05 -0.431 0.666687
`Oda Salon`2+1 -2.357e+05 2.191e+05 -1.076 0.282167
`Oda Salon`2+2 -2.658e+05 3.176e+05 -0.837 0.402742
`Oda Salon`2+5 -3.400e+05 3.767e+05 -0.903 0.366802
`Oda Salon`3+1 -2.205e+05 2.217e+05 -0.995 0.320057
`Oda Salon`3+2 -2.383e+05 2.362e+05 -1.009 0.313198
`Oda Salon`3+5 -4.054e+05 3.422e+05 -1.184 0.236316
`Oda Salon`4+1 -3.964e+05 2.275e+05 -1.743 0.081513 .
`Oda Salon`4+2 -8.005e+05 2.383e+05 -3.360 0.000790 ***
`Oda Salon`5+1 -2.213e+05 2.468e+05 -0.896 0.370068
`Oda Salon`5+2 -8.853e+05 2.731e+05 -3.242 0.001200 **
`Oda Salon`6+1 -1.228e+06 3.856e+05 -3.186 0.001461 **
`Oda Salon`6+2 -1.075e+06 3.246e+05 -3.311 0.000941 ***
`Oda Salon`6+3 -3.735e+06 7.681e+05 -4.862 1.23e-06 ***
`Oda Salon`7+2 -6.971e+07 9.975e+06 -6.989 3.44e-12 ***
`Oda Salon`7+3 -1.982e+06 7.255e+05 -2.732 0.006338 **
Bati 4.756e+04 2.866e+04 1.659 0.097145 .
Dogu -3.334e+04 2.762e+04 -1.207 0.227453
Güney -4.931e+04 2.943e+04 -1.675 0.094008 .
Kuzey -1.060e+05 3.521e+04 -3.011 0.002623 **
`Akilli Ev` 1.898e+05 5.759e+04 3.296 0.000993 ***
`Amerikan Mutfak` -5.887e+04 4.319e+04 -1.363 0.173001
`Beyaz Esya` 2.681e+05 4.909e+04 5.462 5.11e-08 ***
Dusakabin -2.155e+04 3.629e+04 -0.594 0.552626
`Ebeveyn Banyosu` 8.674e+04 3.529e+04 2.458 0.014025 *
Kiler -1.156e+05 4.324e+04 -2.673 0.007552 **
Küvet 7.295e+04 4.786e+04 1.524 0.127554
Mobilya -1.255e+05 5.194e+04 -2.416 0.015741 *
`Parke Zemin` 8.113e+03 2.762e+04 0.294 0.769021
`Seramik Zemin` 1.968e+04 2.886e+04 0.682 0.495326
Vestiyer -2.499e+04 3.240e+04 -0.771 0.440650
Deniz 3.070e+05 3.833e+04 8.011 1.64e-15 ***
Doga 1.926e+04 2.834e+04 0.679 0.496936
Sehir 3.760e+04 3.175e+04 1.184 0.236481
ADSL -1.644e+04 3.094e+04 -0.531 0.595204
`Fiber Internet` -2.553e+04 3.498e+04 -0.730 0.465493
`Kablo TV` -1.419e+04 3.141e+04 -0.452 0.651406
Uydu 2.616e+04 3.133e+04 0.835 0.403767
`Wi-Fi` -4.504e+04 3.611e+04 -1.247 0.212455
Hidrofor 1.551e+04 3.754e+04 0.413 0.679614
Jeneratör 6.466e+04 4.022e+04 1.608 0.108010
Otopark 3.620e+03 3.139e+04 0.115 0.908216
`Ses Yalitimi` 1.325e+04 3.176e+04 0.417 0.676645
`Su Deposu` 3.149e+04 3.593e+04 0.877 0.380817
Cami -7.882e+04 4.203e+04 -1.876 0.060813 .
Kilise 5.621e+04 4.429e+04 1.269 0.204515
Market -4.442e+04 5.364e+04 -0.828 0.407649
Park 3.590e+04 3.884e+04 0.924 0.355344
`Saglik Ocagi` 2.112e+04 4.679e+04 0.451 0.651778
`Semt Pazari` -7.069e+04 4.379e+04 -1.614 0.106543
Sauna 2.789e+04 5.311e+04 0.525 0.599491
`Spor Salonu` -5.249e+04 3.349e+04 -1.567 0.117194
`Tenis Kortu` 4.304e+04 5.482e+04 0.785 0.432419
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 679000 on 2862 degrees of freedom
Multiple R-squared: 0.5908, Adjusted R-squared: 0.5791
F-statistic: 50.39 on 82 and 2862 DF, p-value: < 2.2e-16前20项预测:
V1 pred
1 1200000 881787.6
2 1100000 862002.8
3 245000 339582.8
4 1890000 2160635.7
5 1360000 1036269.9
6 2400000 3067823.0
7 1280000 926335.9
8 575000 411630.6
9 390000 706514.2
10 1300000 1140435.6
11 460000 677953.1
12 920000 1287126.6
13 850000 1614840.1
14 1200000 166346.9
15 1500000 1172148.9
16 1200000 393769.3
17 3000000 1157697.3
18 1500000 1082589.2
19 490000 561175.0
20 3350000 3212890.7https://datascience.stackexchange.com/questions/52398
复制相似问题