我有一台data.table,dt,跨越了好几年。数据被分组,因此每个组具有不同的年份长度。我只想保留每组的前三年。我怎样才能用data.table做到这一点?下面是一些要测试的数据:
dates <- c(seq(as.Date('2010-01-03'),as.Date('2019-12-31'),by = 1),
seq(as.Date('2013-01-02'),as.Date('2018-12-31'),by = 1),
seq(as.Date('2015-01-02'),as.Date('2020-07-31'),by = 1))
set.seed(1995)
value <- rnorm(length(dates), mean = 100, sd = 50)
IDs <- c(rep(c("ACG"),length.out = length(seq(as.Date('2010-01-03'),as.Date('2019-12-31'),by = 1))),
rep(c("MKD"),length.out = length(seq(as.Date('2013-01-02'),as.Date('2018-12-31'),by = 1))),
rep(c("ZED"),length.out = length(seq(as.Date('2015-01-02'),as.Date('2020-07-31'),by = 1)))
)
dt <- data.table(Date = dates,
Value = value,
ID = IDs
)
dt
Date Value ID
1: 2010-01-03 153.03816 ACG
2: 2010-01-04 83.22491 ACG
3: 2010-01-05 107.26521 ACG
4: 2010-01-06 119.70395 ACG
5: 2010-01-07 183.24604 ACG
---
7874: 2020-07-27 184.45801 ZED
7875: 2020-07-28 91.53373 ZED
7876: 2020-07-29 67.42443 ZED
7877: 2020-07-30 125.62496 ZED
7878: 2020-07-31 89.02373 ZED最终的data.table应该只有前三年的数据,如下所示:
finalDT <- dt[c(1:1094,3651:4744,5841:6935),]
finalDT
Date Value ID
1: 2010-01-03 153.03816 ACG
2: 2010-01-04 83.22491 ACG
3: 2010-01-05 107.26521 ACG
4: 2010-01-06 119.70395 ACG
5: 2010-01-07 183.24604 ACG
---
3279: 2017-12-27 102.10622 ZED
3280: 2017-12-28 94.97718 ZED
3281: 2017-12-29 131.47358 ZED
3282: 2017-12-30 112.83836 ZED
3283: 2017-12-31 184.54966 ZED我使用的方法在较小的数据集上工作得很好,但我有超过100个I,其中一些具有20年的数据。我需要在data.table中使用一种编程方法
发布于 2020-09-25 00:21:54
dt[, .SD[year(Date) %in% unique(year(Date))[1:3]], by = ID]或
dt[, .SD[year(Date) <= unique(year(Date))[3]], by = ID]请确保先按日期排序。
https://stackoverflow.com/questions/64050282
复制相似问题