我有一个R数据,它描述了产品销售的演变过程。每季度有2000家商店,每季度有5栏(即。5段时间)。我想知道如何用R来分析它。
我已经试着做了一些基本的分析,即确定第一、第二阶段的平均销售额,然后确定每个时期的平均销售额,然后比较每一家商店相对于这一总体演变的演变。举例来说,第一季的销售总额为5万宗,而第五期则为35000宗。因此,我假设每间店在第5期的正常销售是35 /55=0.63*第一阶段的销售量是35/55=0.63*。如果X店在第一段时间售出100件,我便假设它在第5期一般应售出63件。
显然,这是一种容易操作的方法,但在统计上并不相关.
我想要一种方法,使我能够确定一个趋势限制,模仿我的R-平方。我的目标是使能够通过中和总趋势来分析商店的销售情况:我想确切地知道什么是表现不佳的商店,哪些是表现优异的商店,使用统计上正确的方法。
我的dataframe是这样构造的:
shopID | sum | qt1 | qt2 | qt3 | qt4 | qt5
000001 | 150 | 45 | 15 | 40 | 25 | 25
000002 | 100 | 20 | 20 | 20 | 20 | 20
000003 | 500 | 200 | 0 | 100 | 100 | 100
... (2200 rows)我试图将我的时间感放在一个清单中,这是成功的,下面的功能是:
reversesales=t(data.frame(sales$qt1,sales$qt2,sales$qt3,sales$qt4,sales$qt5))
# I reverse rows and columns of the frame in order that the time periods be the rows
timeser<-ts(reversesales,start=1,end=5, deltat=1/4)
# deltat=1/4 because it is a quarterly basis, 1 and 5 because I have 5 quarters尽管如此,我仍然不能用这个变量做任何事情。由于有2200行(因此R想让我连续绘制2200个图,显然这不是我想要的),所以我不能做任何绘图(使用“绘图”函数)。
此外,我不知道如何确定理论趋势和销售的理论价值,每一个商店的。
谢谢你的帮助!(圣诞快乐)
发布于 2014-12-27 23:29:20
混合模式的实施:
install.packages("nlme")
library("nlme")
library(dplyr)
# Generating some data with a structure like yours:
start <- round(sample(10:100, 50, replace = TRUE)*runif(50))
df <- data_frame(shopID = 1:50, qt1 = start, qt2 =round(qt1*runif(50, .5, 2)) ,qt3 = round(qt2*runif(50, .5, 2)), qt4 = round(qt3*runif(50, .5, 2)), qt5 = round(qt4*runif(50, .5, 2)))
df <- as.data.frame(df)
# Converting in into the long format:
df <- reshape(df, idvar = "shopID", varying = names(df)[-1], direction = "long", sep = "")
Estimating the model:
mod <- lme(qt ~ time, random = ~ time | shopID, data = df)
# Extract the random effects for comparison:
random.effects(mod)
(Intercept) time
1 74.0790805 3.7034172
2 7.8713699 4.2138001
3 -8.0670810 -5.8754060
4 -16.5114428 16.4920663
5 -16.7098229 6.4685228
6 -11.9630688 -8.0411504
7 -12.9669777 21.3071366
8 -24.1099280 32.9274361
9 8.5107335 -9.7976905
10 -13.2707679 -6.6028927
11 3.6206163 -4.1017784
12 21.2342886 -6.7120725
13 -14.6489512 11.6847109
14 -14.7291647 2.1365768
15 10.6791941 3.2097199
16 -14.1524187 -1.6933291
17 5.2120647 8.0119320
18 -2.5172933 -6.5011416
19 -9.0094366 -5.6031271
20 1.4857512 -5.9913865
21 -16.5973442 3.5164298
22 -26.7724763 27.9264081
23 49.0764631 -12.9800871
24 -0.1512509 2.3589947
25 15.7723150 -7.9295698
26 2.1955489 11.0318875
27 -8.0890346 -5.4145977
28 0.1338790 -8.3551182
29 9.7113758 -9.5799588
30 -6.0257683 42.3140432
31 -15.7655545 -8.6226255
32 -4.1450984 18.7995079
33 4.1510104 -1.6384103
34 2.5107652 -2.0871890
35 -23.8640815 7.6680185
36 -10.8228653 -7.7370976
37 -14.1253093 -8.1738468
38 42.4114024 -9.0436585
39 -10.7453627 2.4590883
40 -12.0947901 -5.2763010
41 -7.6578305 -7.9630013
42 -14.9985612 -0.4848326
43 -13.4081771 -7.2655456
44 -11.5646620 -7.5365387
45 6.9116844 -10.5200339
46 70.7785492 -11.5522014
47 -7.3556367 -8.3946072
48 27.3830419 -6.9049164
49 14.3188079 -9.9334156
50 -15.2077850 -7.9161690我将把这些数值解释为:将它们视为偏离零的值,这样,正值与平均值的正偏差,而负值则是与平均值的负偏差。这两列的平均数为零,如下所示:
round(apply(random.effects(mod), 2, mean))
(Intercept) time
0 0 发布于 2014-12-26 17:41:02
library(zoo)
#Reconstructing the data with four quarter columns (instead of five quarters as in your example)
shopID <- c(1, 2, 3, 4, 5)
sum <- c(150, 100, 500, 350, 50)
qt1 <- c(40, 10, 130, 50, 10)
qt2 <- c(40, 40, 110, 100, 15)
qt3 <- c(50, 30, 140, 150, 10)
qt4 <- c(20, 20, 120, 50, 15)
myDF <- data.frame(shopID, sum, qt1, qt2, qt3, qt4)
#The ts() function converts a numeric vector into an R time series object
ts1 <- ts(as.numeric((myDF[1,3:6])), frequency=4)
ts2 <- ts(as.numeric((myDF[2,3:6])), frequency=4)
ts3 <- ts(as.numeric((myDF[3,3:6])), frequency=4)
ts4 <- ts(as.numeric((myDF[4,3:6])), frequency=4)
ts5 <- ts(as.numeric((myDF[5,3:6])), frequency=4)
#Merge time series objects
tsm <- merge(a = as.zoo(ts1), b = as.zoo(ts2), c = as.zoo(ts3), d = as.zoo(ts4), e = as.zoo(ts5))
#Plotting the Time Series
plot.ts(tsm, plot.type = "single", lty = 1:5, xlab = "Time", ylab = "Sales")代码没有优化,可以改进。有关时间序列分析的更多信息可以阅读这里。希望这能给我们一些方向。

https://stackoverflow.com/questions/27657449
复制相似问题