我使用“离群值”包来删除一些不受欢迎的值。但是,似乎rm.outliers() funcion并不能同时替换所有的异常值。可能,rm.outliers()不能递归地执行despikes。然后,基本上我不得不多次调用这个函数来替换所有的异常值。下面是我正在经历的问题的一个可重复的例子:
require(outliers)
# creating a timeseries:
set.seed(12345)
y = rnorm(10000)
# inserting some outliers:
y[4000:4500] = -11
y[4501:5000] = -10
y[5001:5100] = -9
y[5101:5200] = -8
y[5201:5300] = -7
y[5301:5400] = -6
y[5401:5500] = -5
# plotting the timeseries + outliers:
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
# trying to get rid of some outliers by replacing them by the series mean value:
new.y = outliers::rm.outlier(y, fill=TRUE, median=FALSE)
new.y = outliers::rm.outlier(new.y, fill=TRUE, median=FALSE)
# plotting the new timeseries "after removing the outliers":
lines(new.y, col="red")
# inserting a legend:
legend("bottomleft", c("raw", "new series"), col=c("black","red"), lty=c(1,1), horiz=FALSE, bty="n")有没有人知道如何改进上面的代码,以便用平均值替换所有的异常值?
发布于 2016-02-27 18:49:15
我能想到的最好的方法就是使用一个for循环,在找到异常值时跟踪它们。
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
maxIter <- 100
outlierQ <- rep(F, length(y))
for (i in 1:maxIter) {
bad <- outlier(y, logical = T)
if (!any(bad)) break
outlierQ[bad] <- T
y[bad] <- mean(y[!bad])
}
y[outlierQ] <- mean(y[!outlierQ])
lines(y, col="blue")https://stackoverflow.com/questions/35673523
复制相似问题