我有一个数据帧充当changelog。我想累加当前日期和之前所有日期的值,过滤掉旧的重复ID。
与这个问题非常相似:cumsum() up to and including current date in dplyr
下面是当前的代码:
library(dplyr)
tribble(~ID,~Date, ~value,
"ID-1", "2019-01-01", 50,
"ID-2", "2019-01-02", 20,
"ID-3", "2019-01-03", 35,
"ID-1", "2019-01-04", 0,
"ID-4", "2019-01-04", 20,
"ID-5", "2019-01-04", 25,
"ID-6", "2019-01-07", 100,
"ID-3", "2019-01-08", 0,
"ID-7", "2019-01-08", 15,
"ID-8", "2019-01-08", 10,
"ID-6", "2019-01-10", 0,
"ID-9", "2019-01-10", 45,
"ID-10", "2019-01-10", 40) %>% arrange(Date) %>% mutate(run_sum=cumsum(value)) %>% group_by(Date) %>% mutate(run_sum = last(run_sum))输出:
# A tibble: 13 x 4
# Groups: Date [7]
ID Date value run_sum
<chr> <chr> <dbl> <dbl>
1 ID-1 2019-01-01 50 50
2 ID-2 2019-01-02 20 70
3 ID-3 2019-01-03 35 105
4 ID-1 2019-01-04 0 150
5 ID-4 2019-01-04 20 150
6 ID-5 2019-01-04 25 150
7 ID-6 2019-01-07 100 250
8 ID-3 2019-01-08 0 275
9 ID-7 2019-01-08 15 275
10 ID-8 2019-01-08 10 275
11 ID-6 2019-01-10 0 360
12 ID-9 2019-01-10 45 360
13 ID-10 2019-01-10 40 360有什么好方法可以让run_sum列看起来像这样吗?
# A tibble: 13 x 4
ID Date value run_sum
<chr> <chr> <dbl> <dbl>
1 ID-1 2019-01-01 50 50
2 ID-2 2019-01-02 20 70
3 ID-3 2019-01-03 35 105
4 ID-1 2019-01-04 0 100
5 ID-4 2019-01-04 20 100
6 ID-5 2019-01-04 25 100
7 ID-6 2019-01-07 100 200
8 ID-3 2019-01-08 0 190
9 ID-7 2019-01-08 15 190
10 ID-8 2019-01-08 10 190
11 ID-6 2019-01-10 0 175
12 ID-9 2019-01-10 45 175
13 ID-10 2019-01-10 40 175当新的ID更新发生时,run_sum在哪里过滤掉旧的ID重复值?
发布于 2019-10-31 04:27:25
好吧,这里有一种方法,首先我们通过ID查看值,并使用ival跟踪与前一个值的差异。然后,我们再次按日期查找,对于每个日期组,找到值的总变化。然后,我们只保留该组中第一行的值更改。然后,我们可以对该列进行累加求和,以获得我们想要的值。
dd %>% group_by(ID) %>%
mutate(ival=value-lag(value,default=0)) %>%
group_by(Date) %>%
mutate(diff = sum(ival)) %>%
mutate(diff = diff * (row_number()==1)) %>% # only keep first
ungroup() %>%
mutate(run_sum = cumsum(diff)) %>%
select(-ival, -diff)
# ID Date value run_sum
# <chr> <chr> <dbl> <dbl>
# 1 ID-1 2019-01-01 50 50
# 2 ID-2 2019-01-02 20 70
# 3 ID-3 2019-01-03 35 105
# 4 ID-1 2019-01-04 0 100
# 5 ID-4 2019-01-04 20 100
# 6 ID-5 2019-01-04 25 100
# 7 ID-6 2019-01-07 100 200
# 8 ID-3 2019-01-08 0 190
# 9 ID-7 2019-01-08 15 190
# 10 ID-8 2019-01-08 10 190
# 11 ID-6 2019-01-10 0 175
# 12 ID-9 2019-01-10 45 175
# 13 ID-10 2019-01-10 40 175这里的dd是不带run_sum值的数据
https://stackoverflow.com/questions/58632878
复制相似问题