我想在数组的开始计算NA观测值,使用以下两个非NA观测值的线性近似来推断缺失值。然后对数组末尾的NA观测做同样的操作,使用前面的两个非NA观测。
我的df的一个可复制的例子:
M=matrix(sample(1:9,10*10,T),10);M[sample(1:length(M),0.5*length(M),F)]=NA;dimnames(M)=list(paste(rep("City",dim(M)[1]),1:dim(M)[1],sep=""),paste(rep("Year",dim(M)[2]),1:dim(M)[2],sep=""))
M
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 NA 4 5 NA 3 NA NA NA 5 NA
City2 6 NA 3 3 NA 4 6 NA NA 7
City3 NA 7 NA 8 8 NA NA 8 NA 5
City4 3 5 3 NA NA 3 5 9 8 7
City5 4 6 6 NA NA 8 NA 7 1 NA
City6 NA NA NA NA 4 NA 8 3 6 7
City7 9 3 NA NA NA NA NA 4 NA NA
City8 5 6 9 8 5 NA NA 1 4 NA
City9 NA NA 6 NA 3 3 8 NA 7 NA
City10 NA NA NA NA NA NA NA NA NA 1
idx=rowSums(!is.na(M))>=2 # Index of rows with 2 or more non-NA to run na.approx
library(zoo)
M[idx,]=t(na.approx(t(M[idx,]),rule=1,method="linear")) # I'm using t as na.approx works on columns
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 NA 4.0 5 4.0 3.000000 3.50 4.0 4.5 5 NA
City2 6.0 5.5 3 3.0 5.500000 4.00 6.0 6.0 6 7
City3 4.5 7.0 3 8.0 8.000000 3.50 5.5 8.0 7 5
City4 3.0 5.0 3 8.0 6.666667 3.00 5.0 9.0 8 7
City5 4.0 6.0 6 8.0 5.333333 8.00 6.5 7.0 1 7
City6 6.5 4.5 7 8.0 4.000000 6.75 8.0 3.0 6 7
City7 9.0 3.0 8 8.0 4.500000 5.50 8.0 4.0 5 NA
City8 5.0 6.0 9 8.0 5.000000 4.25 8.0 1.0 4 NA
City9 NA NA 6 4.5 3.000000 3.00 8.0 7.5 7 NA
City10 NA NA NA NA NA NA NA NA NA 1我想根据前面的两个/下面的观察,用线性近似来推断边界(对于City1和City9)。例如,M[1,1]应该是3,M[1,10]应该是5,5。
你知道我该怎么做吗?
发布于 2015-05-11 13:22:46
这给出了第一列,其中填充了NA的线性外推值。您可以适应最后一篇专栏文章。
firstNAfill <- function(x) {
ans <- ifelse(!is.na(x[1]),
x[1],
ifelse(sum(!is.na(x))<2, NA,
2*x[which(!is.na(x[1, ]))[1]] - x[which(!is.na(x[1, ]))[2]]
)
)
return(ans)
}
dat$Year1 <- unlist(lapply(seq(1:nrow(dat)), function(x) {firstNAfill(dat[x, ])}))结果
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 3.0 4.0 5 4.0 3.000000 3.50 4.0 4.5 5 NA
City2 6.0 5.5 3 3.0 5.500000 4.00 6.0 6.0 6 7
City3 4.5 7.0 3 8.0 8.000000 3.50 5.5 8.0 7 5
City4 3.0 5.0 3 8.0 6.666667 3.00 5.0 9.0 8 7
City5 4.0 6.0 6 8.0 5.333333 8.00 6.5 7.0 1 7
City6 6.5 4.5 7 8.0 4.000000 6.75 8.0 3.0 6 7
City7 9.0 3.0 8 8.0 4.500000 5.50 8.0 4.0 5 NA
City8 5.0 6.0 9 8.0 5.000000 4.25 8.0 1.0 4 NA
City9 7.5 NA 6 4.5 3.000000 3.00 8.0 7.5 7 NA
City10 NA NA NA NA NA NA NA NA NA 1该函数返回第一列的当前值(如果不是NA )、如果没有两个值可外推则返回NA,否则返回外推值。
发布于 2015-05-11 16:50:27
在extrap中,nlead是输入向量x中的前导NAs数。non.na是不属于NA的x元素的子集。如果没有引导NA元素,或者如果有少于2个非NA元素,则返回输入。m是前两个非NAs的斜率.用外推替换x的第一个x元素。最后,我们使用extrap对每一行M应用MM[] <-,从而保留列名,然后反转每一行,重复和反向返回:
library(zoo)
extrap <- function(x) {
nlead <- which.min(x * 0) - 1
non.na <- na.omit(x)
if (length(nlead) == 0 || nlead == 0) || length(non.na) < 2) return(x)
m <- diff(head(non.na, 2))
replace(x, seq_len(nlead), non.na[1] - nlead:1 * m)
}
nc <- ncol(M)
naApprox <- function(x) if (length(na.omit(x)) < 2) x else na.approx(x, na.rm = FALSE)
MM <- M
MM[] <- t(apply(MM, 1, naApprox))
MM[] <- t(apply(MM, 1, extrap)) # extraploate to fill leading NAs
MM[] <- t(apply(MM[, nc:1], 1, extrap))[, nc:1] # extrapolate to fill trailing NAs给予:
> MM
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 3.0 4.0 5.000000 4.000000 3.000000 3.500000 4.000000 4.500000 5.000000 5.500000
City2 6.0 4.5 3.000000 3.000000 3.500000 4.000000 6.000000 6.333333 6.666667 7.000000
City3 6.5 7.0 7.500000 8.000000 8.000000 8.000000 8.000000 8.000000 6.500000 5.000000
City4 3.0 5.0 3.000000 3.000000 3.000000 3.000000 5.000000 9.000000 8.000000 7.000000
City5 4.0 6.0 6.000000 6.666667 7.333333 8.000000 7.500000 7.000000 1.000000 -5.000000
City6 -4.0 -2.0 0.000000 2.000000 4.000000 6.000000 8.000000 3.000000 6.000000 7.000000
City7 9.0 3.0 3.166667 3.333333 3.500000 3.666667 3.833333 4.000000 4.166667 4.333333
City8 5.0 6.0 9.000000 8.000000 5.000000 3.666667 2.333333 1.000000 4.000000 7.000000
City9 9.0 7.5 6.000000 4.500000 3.000000 3.000000 8.000000 7.500000 7.000000 6.500000
City10 NA NA NA NA NA NA NA NA NA 1.000000Note我们用它作为M
M <- structure(c(NA, 6L, NA, 3L, 4L, NA, 9L, 5L, NA, NA, 4L, NA, 7L,
5L, 6L, NA, 3L, 6L, NA, NA, 5L, 3L, NA, 3L, 6L, NA, NA, 9L, 6L,
NA, NA, 3L, 8L, NA, NA, NA, NA, 8L, NA, NA, 3L, NA, 8L, NA, NA,
4L, NA, 5L, 3L, NA, NA, 4L, NA, 3L, 8L, NA, NA, NA, 3L, NA, NA,
6L, NA, 5L, NA, 8L, NA, NA, 8L, NA, NA, NA, 8L, 9L, 7L, 3L, 4L,
1L, NA, NA, 5L, NA, NA, 8L, 1L, 6L, NA, 4L, 7L, NA, NA, 7L, 5L,
7L, NA, 7L, NA, NA, NA, 1L), .Dim = c(10L, 10L), .Dimnames = list(
c("City1", "City2", "City3", "City4", "City5", "City6", "City7",
"City8", "City9", "City10"), c("Year1", "Year2", "Year3",
"Year4", "Year5", "Year6", "Year7", "Year8", "Year9", "Year10"
)))更新:修复。
https://stackoverflow.com/questions/30167674
复制相似问题