文章/答案/技术大牛

发布

社区首页 >问答首页 >For Loop Auto Arima by customer

问For Loop Auto Arima by customer
EN

Stack Overflow用户

提问于 2020-11-19 01:11:10

回答 1查看 39关注 0票数 0

我正在合作一个项目，这个项目需要我使用R，而我到目前为止还没有任何使用R的经验。我正在尝试将自动arima应用于我的数据集中的分区/窗口，但我甚至不知道如何开始。

本质上，我希望使用行c_id = "none“在每个partner_id上训练一个单独的模型，然后预测/预测值到每个partner_id的最大值(日期)。每个合作伙伴的月数/行数长度各不相同。对于下面粘贴的这个示例数据帧，partner_id = "1A9“具有12个月/行，c_id = "none”与partner_id = "1B9“具有13个月/行，c_id = "none”。每个partner_id中扩展到最大(日期)的月数/行数也各不相同。我本质上是想将我的预测和可能的预测间隔加入到我的原始数据框架中。

我已经开始了一些代码，但是我一直收到一个错误“error in if (frequency >1 && 0< (d <- abs(frequency -round(Frequency) &&：在需要TRUE/FALSE的地方缺少值”。当我尝试训练自动Arima时，它似乎卡住了。到目前为止我的代码是

predictions_df <- data.frame(c_id = character(),
                         partner_id = character(),
                         rev_month = character(),
                         rev = double())

partners <- unique(df$partner_id) 

for (i in 1:length(partners)) {
 x1 <- df[df$partner_id %in% partners, ] # likely redundant since I
 x1_train <- x1[x1$c_id == "No-Contract", c(3, 4)] # training data
 x1_test <- x1[x1$c_id != "No-Contract", c(3, 4)] # forecast data
 c_int <- x1[x1$c_id != "No-Contract", 1] # confidence interval data?

# convert training data to time-series object
x1_train_ts <- xts::xts(x1_train$rev, order.by=x1_train$rev_month)
# run auto arima on the time series
tm <- forecast::auto.arima(x1_train_ts, approximation = FALSE, biasadj = TRUE, stepwise = FALSE)
# forecast the number of future steps (rows for to predict data)
fc <- forecast::forecast(tm, nrow(x1_test))

# append predictions back to dataframe
predictions_df <- rbind(predictions_df, data.frame(ar_name, partner_id, rev_month = as.character(x1_test$rev_month),   rev = as.double(fc$mean)))

我已经包含了一个示例数据集和下面的代码。

test <- data.frame("c_id" = c("none","none","none","none","none",
"none","none","none","none","none","none","none","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101", "none","none","none","none","none","none","none","none","none","none","none","none","none","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111"), "partner_id" = c("1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9"), "rev_month" = as.Date(c("2015-12-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01", "2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01", "2017-01-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01", "2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01")), "rev" = c(101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 124.10, 125.35, 125.45), stingsAsFactors=FALSE)

如果有更好的方式写这篇文章，我是开放的，但我非常希望今天能完成这篇文章。任何帮助都是非常感谢的。提前谢谢你。

loops

arima

回答 1

Stack Overflow用户

发布于 2020-11-28 08:25:36

要注意的要点：

正如在代码中发布的那样，你必须确保你的数据是干净的，并按时间顺序对每个函数进行排序。如果你有一个很大的数据集，请尝试使用"auto.arima“

的默认参数，同时，确保你有足够的数据以获得更可靠的估计。例如，在您提供的样本数据集中，您的预测展望期大于您的训练长度。

您应该能够计算样本内和样本外预测精度(均方根、均方根、均方根等)。在验证集中(如果您有足够的数据，则应使用70-80/30-20混合将其划分为训练集和验证集)。

这是我的答案：

# Load libraries
library(forecast)
library(dplyr)

# Load the dataset - the one you provided for these purposes
c_id <- c("none","none","none","none","none","none","none","none","none","none","none","none","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101", "none","none","none","none","none","none","none","none","none","none","none","none","none","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111")
partner_id <- c("1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9")
rev_month <- as.Date(c("2015-12-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01", "2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01", "2017-01-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01", "2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01"))
rev <- c(101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 124.10, 125.35, 125.45)
df <- data.frame(c_id, partner_id, rev_month, rev, stringsAsFactors = FALSE)
# Make sure that your data is sorted in cronological order
df <- 
  df %>% 
  arrange(partner_id, rev_month)
# Create a new column to store the predictions for each model
df$model_predictions <- 0

# Fit an ARIMA model for each partner and get n-steps forecast
partners <- unique(df$partner_id)
for (k in partners) {
  # Keep data for the partner of interest for each iteration
  df_iter <- df[df$partner_id == k,]
  # Get training data
  # Recall that dates don't matter as long as your data is
  # sorted in cronological order and you have one observation per date
  train_data <- df_iter[df_iter$c_id == "none", "rev"]
  # Get number of steps to forecast
  n_ahead <- nrow(df_iter[df_iter$c_id != "none",])
  # Fit the model
  model <-
    auto.arima(train_data, approximation = FALSE, biasadj = TRUE, stepwise = FALSE)
  # Forecast data for the given partner
  fcast <- as.numeric(forecast(model, h = n_ahead, biasadj = TRUE)$mean)
  # Place fitted and forecasted values in the original data frame
  predictions <- c(model$fitted, fcast)
  df[df$partner_id == k, "model_predictions"] <- predictions
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64897934

复制

相似问题

问For Loop Auto Arima by customer
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问For Loop Auto Arima by customerEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问For Loop Auto Arima by customer
EN