我有一个大的面板数据集,我想滞后和领导一个变量的1个月零6个工作日。例如,我知道在dplyr中有lag或lead函数。但是,我还需要根据面板数据中的“名称”对数据进行分组。
我的数据如下所示:
structure(list(Date = c("01.08.2018", "02.08.2018", "03.08.2018",
"04.08.2018", "05.08.2018", "06.04.2019", "07.04.2019", "08.04.2019",
"01.08.2018", "02.08.2018", "03.08.2018", "04.08.2018", "06.04.2019",
"07.04.2019", "08.04.2019", "01.08.2018", "02.08.2018", "03.08.2018",
"04.08.2018", "05.08.2018", "07.04.2019", "08.04.2019"), Name = c("A",
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"B", "C", "C", "C", "C", "C", "C", "C"), Rating = c(1L, 1L, 1L,
3L, 3L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L,
5L, 5L, 5L), Size = c(1234L, 24123L, 23L, 1L, 23L, 3L, 23L, 4L,
323L, 3424L, 523L, 234L, 35L, 354L, 45L, 23L, 46L, 456L, 546L,
24L, 134L, 1L)), class = "data.frame", row.names = c(NA, -22L
))这只是一个简化的版本。我的真实数据从2018年01.08.2018持续到31.12.2021。我如何才能将所谓的“评级”的变量滞后和领先1个月零6个工作日?
我的困难是,我有1个月6个工作日,而不仅仅是数据中的一个变量。所有其他变量都不应进行调整。
到目前为止,我尝试过这个:
Data_2 <- Data %>%
group_by(Name) %>%
lag('Rating')Data_3 <- Data %>%
group_by(Name) %>%
lead('Rating')但这不是我想要的。
编辑:
在铅的情况下,我的输出应该是这样的:(我刚刚用前5行来说明)
structure(list(Date = c("10.09.2018", "11.09.2018", "12.09.2018",
"13.09.2018", "14.09.2018"), Name = c("A", "A", "A", "A", "A"
), Rating = c(1L, 1L, 1L, 3L, 3L), Size = c("Size from 10.09.2018 would be here",
"Size from 11.09.2018 would be here", "Size from 12.09.2018 would be here",
"Size from 13.09.2018 would be here", "Size from 14.09.2018 would be here"
)), class = "data.frame", row.names = c(NA, -5L))因此,对于第1行,我增加了1个月和6个工作日,这给了我10.09.2018,等等。“评级”将是从2018年01.08.2018,但“规模”将是实际上也报告了10.09.2018。然后,我也想做同样的事,但倒退1个月零6个工作日。
发布于 2022-05-06 19:44:49
这里有一种方法,适用于"x天后“。在这种情况下,我使用2天后在您的数据上演示,但35天后可能会很好地得到5周后#,同一天的一周,所以应该是另一个“工作日”的大部分时间。
# Convert dates to a date format that can be calculated upon
Data2 <- Data %>% mutate(Date = lubridate::dmy(Date))
Data2 %>%
mutate(Date_future = Date + 2) %>%
left_join(Data2, by = c("Name", "Date_future" = "Date"),
suffix = c("_now", "_future"))
# pipe into line below to just show selected columns
# select(Date_future, Name, Rating_now, Size_future)结果
Date Name Rating_now Size_now Date_future Rating_future Size_future
1 2018-08-01 A 1 1234 2018-08-03 1 23
2 2018-08-02 A 1 24123 2018-08-04 3 1
3 2018-08-03 A 1 23 2018-08-05 3 23
4 2018-08-04 A 3 1 2018-08-06 NA NA
5 2018-08-05 A 3 23 2018-08-07 NA NA
6 2019-04-06 A 4 3 2019-04-08 4 4
7 2019-04-07 A 4 23 2019-04-09 NA NA
8 2019-04-08 A 4 4 2019-04-10 NA NA
9 2018-08-01 A 3 323 2018-08-03 1 23
10 2018-08-02 B 3 3424 2018-08-04 2 234
11 2018-08-03 B 2 523 2018-08-05 NA NA
12 2018-08-04 B 2 234 2018-08-06 NA NA
13 2019-04-06 B 2 35 2019-04-08 1 45
14 2019-04-07 B 1 354 2019-04-09 NA NA
15 2019-04-08 B 1 45 2019-04-10 NA NA
16 2018-08-01 C 1 23 2018-08-03 3 456
17 2018-08-02 C 3 46 2018-08-04 3 546
18 2018-08-03 C 3 456 2018-08-05 5 24
19 2018-08-04 C 3 546 2018-08-06 NA NA
20 2018-08-05 C 5 24 2018-08-07 NA NA
21 2019-04-07 C 5 134 2019-04-09 NA NA
22 2019-04-08 C 5 1 2019-04-10 NA NAhttps://stackoverflow.com/questions/72146041
复制相似问题