首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R中的滚动逐步回归

R中的滚动逐步回归
EN

Stack Overflow用户
提问于 2016-07-07 01:51:33
回答 1查看 1.9K关注 0票数 0

我有一个由12个预测者组成的数据框架和一个名为BEI的数字列表(我想要预测它)。我希望在每12行数据上运行逐步选择,例如1:12、2:13等。对于每个滚动,我希望返回系数并使用这些系数来预测BEI。下面是我的代码:

代码语言:javascript
复制
k = length(BEI)
coef.list <- numeric()
predicted.list <- numeric()
for(i in 1:(k-11)){
  BEI.subset <- BEI[i:(i+11)]
  predictors.subset <- predictors[c(i:(i+11)),]
  fit.stepwise <- regsubsets(BEI.subset~., data = predictors.subset, nvmax = 10, method = "forward")
  fit.summary <- summary(fit.stepwise)
  id <- which.min(fit.summary$cp)
  coefficients <- coef(fit.stepwise,id)
  coef.list <- append(coef.list, coefficients)
  form <- as.formula(fit.stepwise$call[[2]])
  mat <- model.matrix(form,predictors.subset)
  predicted.stepwise <- mat[,names(coefficients)]%*%coefficients
  predicted.list <- append(predicted.list, predicted.stepwise)
}

我得到了这样的错误:重新排序变量并重试:有50个或更多的警告(使用warnings()查看前50个)

警告是: 1:在leaps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...:1找到线性依赖2:在leaps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...:1找到线性依赖3:在leaps.setup(x,y,wt = wt,nbest = nvmax,...:1找到线性依赖...等。

我该如何解决这个问题?或者这是编写代码的更好方法?

EN

回答 1

Stack Overflow用户

发布于 2016-07-07 16:50:13

您遇到错误的原因是由于滚动数据子集缺少值(NA)。

以data(瑞士)为例:

代码语言:javascript
复制
dim(swiss) 
# [1] 47  6

split_swiss <- lapply(1:nrow(swiss), function(x) swiss[x:(x+11),])
length(split_swiss)
# [1] 47  ## rolling subset produce 47 data.frames. 

lapply(tail(split_swiss), head) # show the first 6 rows of the last 6 data.frames 
[[1]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Neuchatel         64.4        17.6          35        32    16.92             23.0
Val de Ruz        77.6        37.6          15         7     4.97             20.0
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3

[[2]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Val de Ruz        77.6        37.6          15         7     4.97             20.0
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA

[[3]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA
NA.1                NA          NA          NA        NA       NA               NA

[[4]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA
NA.1                NA          NA          NA        NA       NA               NA
NA.2                NA          NA          NA        NA       NA               NA

[[5]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Droite      44.7        46.6          16        29    50.43             18.2
Rive Gauche      42.8        27.7          22        29    58.33             19.3
NA                 NA          NA          NA        NA       NA               NA
NA.1               NA          NA          NA        NA       NA               NA
NA.2               NA          NA          NA        NA       NA               NA
NA.3               NA          NA          NA        NA       NA               NA

[[6]]
            Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Gauche      42.8        27.7          22        29    58.33             19.3
NA                 NA          NA          NA        NA       NA               NA
NA.1               NA          NA          NA        NA       NA               NA
NA.2               NA          NA          NA        NA       NA               NA
NA.3               NA          NA          NA        NA       NA               NA
NA.4               NA          NA          NA        NA       NA               NA

如果你要用这些data.frames运行正则子集,其中预测值比案例多,那么就会出现错误。

代码语言:javascript
复制
lapply(split_swiss, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))

 Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in,  : 
  y and x different lengths In addition: Warning messages:
1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in,  :
  1  linear dependencies found
 ......

相反,我只能保留12行的子集,并按如下方式继续回归:

代码语言:javascript
复制
split_swiss_2 <- split_swiss[sapply(lapply(split_swiss, na.omit), nrow) == 12]
lapply(split_swiss_2, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/38230559

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档