我有一个由12个预测者组成的数据框架和一个名为BEI的数字列表(我想要预测它)。我希望在每12行数据上运行逐步选择,例如1:12、2:13等。对于每个滚动,我希望返回系数并使用这些系数来预测BEI。下面是我的代码:
k = length(BEI)
coef.list <- numeric()
predicted.list <- numeric()
for(i in 1:(k-11)){
BEI.subset <- BEI[i:(i+11)]
predictors.subset <- predictors[c(i:(i+11)),]
fit.stepwise <- regsubsets(BEI.subset~., data = predictors.subset, nvmax = 10, method = "forward")
fit.summary <- summary(fit.stepwise)
id <- which.min(fit.summary$cp)
coefficients <- coef(fit.stepwise,id)
coef.list <- append(coef.list, coefficients)
form <- as.formula(fit.stepwise$call[[2]])
mat <- model.matrix(form,predictors.subset)
predicted.stepwise <- mat[,names(coefficients)]%*%coefficients
predicted.list <- append(predicted.list, predicted.stepwise)
}我得到了这样的错误:重新排序变量并重试:有50个或更多的警告(使用warnings()查看前50个)
警告是: 1:在leaps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...:1找到线性依赖2:在leaps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...:1找到线性依赖3:在leaps.setup(x,y,wt = wt,nbest = nvmax,...:1找到线性依赖...等。
我该如何解决这个问题?或者这是编写代码的更好方法?
发布于 2016-07-07 16:50:13
您遇到错误的原因是由于滚动数据子集缺少值(NA)。
以data(瑞士)为例:
dim(swiss)
# [1] 47 6
split_swiss <- lapply(1:nrow(swiss), function(x) swiss[x:(x+11),])
length(split_swiss)
# [1] 47 ## rolling subset produce 47 data.frames.
lapply(tail(split_swiss), head) # show the first 6 rows of the last 6 data.frames
[[1]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Neuchatel 64.4 17.6 35 32 16.92 23.0
Val de Ruz 77.6 37.6 15 7 4.97 20.0
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
[[2]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Val de Ruz 77.6 37.6 15 7 4.97 20.0
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
[[3]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
[[4]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
[[5]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA
[[6]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA
NA.4 NA NA NA NA NA NA如果你要用这些data.frames运行正则子集,其中预测值比案例多,那么就会出现错误。
lapply(split_swiss, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))
Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in, :
y and x different lengths In addition: Warning messages:
1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in, :
1 linear dependencies found
......相反,我只能保留12行的子集,并按如下方式继续回归:
split_swiss_2 <- split_swiss[sapply(lapply(split_swiss, na.omit), nrow) == 12]
lapply(split_swiss_2, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))https://stackoverflow.com/questions/38230559
复制相似问题