我有数据集,我必须执行每日预测,按组分割。这个小组是client+stuff
ts <- read.csv("C:/Users/Admin/Desktop/mydat.csv",sep=";", dec=",")在这里,mydat
structure(list(Data = structure(c(1L, 3L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 2L, 4L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L), .Label = c("01.04.2017",
"01.06.2017", "02.04.2017", "02.06.2017", "03.04.2017", "04.04.2017",
"05.04.2017", "06.04.2017", "07.04.2017", "08.04.2017", "09.04.2017",
"10.04.2017", "11.04.2017", "12.05.2017", "13.05.2017", "14.05.2017",
"15.05.2017", "16.05.2017", "17.05.2017", "18.05.2017", "19.05.2017",
"20.05.2017", "21.05.2017", "22.05.2017", "23.05.2017", "24.05.2017",
"25.05.2017", "26.05.2017", "27.05.2017", "28.05.2017", "29.05.2017",
"30.05.2017", "31.05.2017"), class = "factor"), client = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Horns and hooves", "Kornev & Co."
), class = "factor"), stuff = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("chickens", "hooves", "Oysters"), class = "factor"),
Продажи = c(374L, 12L, 120L, 242L, 227L, 268L, 280L, 419L,
12L, 172L, 336L, 117L, 108L, 150L, 90L, 117L, 116L, 146L,
120L, 211L, 213L, 67L, 146L, 118L, 152L, 122L, 201L, 497L,
522L, 65L, 268L, 441L, 247L, 348L, 445L, 477L, 62L, 226L,
476L, 306L)), .Names = c("Data", "client", "stuff", "Продажи"
), class = "data.frame", row.names = c(NA, -40L))当然,我可以手动分离三个数据集。
horns and hooves + hooves
Horns and hooves + chickens
Kornev & Co. + oysters但是,当我有一个庞大的数据集,并且有数百个组的时候,该怎么办呢?不要手动拆分。是否有可能将其分成R组,然后进行预测?
预报代码很简单。
我第一次这么做
library(forecast)
library(lubridate)
msts <- msts(ts$sales,seasonal.periods = c(7,365.25),start = decimal_date(as.Date("2017-05-12")))
plot(msts, main="sales", xlab="Year", ylab="sales")
tbats <- tbats(msts)
plot(tbats, main="Multiple Season Decomposition")
sp<- predict(tbats,h=14) #14 days forecast
plot(sp, main = "TBATS Forecast", include=14)
print(sp)如果结果不适合我,我将通过虚拟变量进行预测
tsw <- ts(ts$Sales, start = decimal_date(as.Date("2017-05-12")), frequency = 7)
View(tsw)
mytslm <- tslm(tsw ~ trend + season)
print(mytslm)
residarima1 <- auto.arima(mytslm$residuals)
residualsArimaForecast <- forecast(residarima1, h=14)
residualsF <- as.numeric(residualsArimaForecast$mean)
regressionForecast <- forecast(mytslm,h=14)
regressionF <- as.numeric(regressionForecast$mean)
forecastR <- regressionF+residualsF
print(forecastR)发布于 2018-02-11 16:01:48
您可以使用split将数据按组合因素(在本例中为client和stuff列)划分为组。
group_list <- split(mydat, list(mydat$client, mydat$stuff))
group_list <- group_list[sapply(group_list, function(x) nrow(x) != 0)]然后,您可以使用这个列表和lapply任何您想要的函数。以下是您如何执行您的第一次预测。请注意,我已经将预测代码与绘图代码分开,预测的每一步都由一个函数完成,首先应用函数msts并生成这样的对象列表,然后应用函数tbats并生成另一个列表。
fun_msts <- function(ts){
msts(ts$Sales, seasonal.periods = c(7,365.25), start = decimal_date(as.Date("2017-05-12")))
}
fun_sp <- function(m){
tbats <- tbats(m)
predict(tbats, h=14) #14 days forecast
}
msts_list <- lapply(group_list, fun_msts)
sp_list <- lapply(msts_list, fun_sp)如果你想的话,你可以画出结果。为了做到这一点,定义另外两个要被lapply编辑的函数。
plot_msts <- function(m, new.window = TRUE){
if(new.window) windows()
plot(m, main="Sales", xlab="Year", ylab="Sales")
}
plot_sp <- function(sp, new.window = TRUE){
if(new.window) windows()
plot(sp, main = "TBATS Forecast", include = 14)
}
lapply(msts_list, plot_msts)
lapply(sp_list, plot_sp)在这些功能中,使用函数windows打开了一个新的图形设备。如果您不使用Microsoft,或者如果您想打开另一种类型的设备,请更改该指令,但保留if(new.window)。
编辑.
对于使用虚拟变量的回归,您可以执行以下操作。
fun_tslm <- function(x, start = "2017-05-12", freq = 7){
tsw <- ts(x[["Sales"]], start = decimal_date(as.Date(start)), frequency = freq)
#View(tsw)
mytslm <- tslm(tsw ~ trend + season)
mytslm
}
fun_forecast <- function(x, h = 14){
residarima1 <- auto.arima(x[["residuals"]])
residualsArimaForecast <- forecast(residarima1, h = h)
residualsF <- as.numeric(residualsArimaForecast$mean)
regressionForecast <- forecast(x, h = h)
regressionF <- as.numeric(regressionForecast$mean)
forecastR <- regressionF + residualsF
forecastR
}
tslm_list <- lapply(group_list, fun_tslm)
fore_list <- lapply(tslm_list, fun_forecast)https://stackoverflow.com/questions/48732983
复制相似问题