我想解决一个棘手的问题:
我的数据如下所示:
UniqueID Month
ABC123 1
ABC123 2
ABC123 3
ABC123 4
ABC123 6
ABC123 7
DEF456 3
DEF456 4
DEF456 10
DEF456 11
DEF456 12
DEF456 14
GHI789 2
GHI789 3
JKL012 12
JKL012 13
JKL012 14 UniqueID每月都是独一无二的。月份列是指特定月份。例如:2018年的1=October,2019年的2=November等等。我们总共有14个不同月份的数据。我想累计计算我们跳过一个月的次数,以及当每个UniqueID的最后一个月不是14次时,开始月份不包括在计算中。由此得出的计算结果将得到以下示例:
UniqueID Month CountSkip
ABC123 1 0
ABC123 2 0
ABC123 3 0
ABC123 4 0
ABC123 6 1
ABC123 7 2
DEF456 3 0
DEF456 4 0
DEF456 10 1
DEF456 11 1
DEF456 12 1
DEF456 14 2
GHI789 2 0
GHI789 3 1
JKL012 12 0
JKL012 13 0
JKL012 14 0通过执行以下操作,我有一个代码段来计算跳过的总数:
data %>%
group_by(UniqueID) %>%
mutate(Skipped = sum(diff(Month) > 1))我如何修改它以累计计算跳过的次数,同时计算最后一个月的值(而不是14 )?
任何帮助都将不胜感激!谢谢!
发布于 2020-01-28 22:30:54
这里有一种方法。如果这有你想要的逻辑,请告诉我。
library(tidyverse)
data %>%
group_by(UniqueID) %>%
mutate(Skip = if_else(Month - lag(Month, default = first(Month) - 1) - 1 > 0 |
(Month == last(Month) & Month != 14), 1, 0),
CountSkip = cumsum(Skip))
# A tibble: 17 x 4
# Groups: UniqueID, CountSkip [9]
UniqueID Month Skip CountSkip
<chr> <int> <dbl> <dbl>
1 ABC123 1 0 0
2 ABC123 2 0 0
3 ABC123 3 0 0
4 ABC123 4 0 0
5 ABC123 6 1 1
6 ABC123 7 1 2
7 DEF456 3 0 0
8 DEF456 4 0 0
9 DEF456 10 1 1
10 DEF456 11 0 1
11 DEF456 12 0 1
12 DEF456 14 1 2
13 GHI789 2 0 0
14 GHI789 3 1 1
15 JKL012 12 0 0
16 JKL012 13 0 0
17 JKL012 14 0 0Data (来自@akrun)
data <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123",
"ABC123", "ABC123", "DEF456", "DEF456", "DEF456", "DEF456", "DEF456",
"DEF456", "GHI789", "GHI789", "JKL012", "JKL012", "JKL012"),
Month = c(1L, 2L, 3L, 4L, 6L, 7L, 3L, 4L, 10L, 11L, 12L,
14L, 2L, 3L, 12L, 13L, 14L)), class = "data.frame", row.names = c(NA,
-17L))发布于 2020-01-28 19:24:23
我们可以直接在cumsum逻辑向量上使用diff
library(dplyr)
data %>%
group_by(UniqueID) %>%
mutate(i1 = c(FALSE, diff(Month) > 1)) %>%
group_by(grp = cumsum(c(TRUE, i1[-1])), add = TRUE) %>%
mutate(Count = row_number() * any(i1) ) %>%
ungroup %>%
select(-i1, -grp)
# A tibble: 17 x 3
# UniqueID Month Count
# <chr> <int> <int>
# 1 ABC123 1 0
# 2 ABC123 2 0
# 3 ABC123 3 0
# 4 ABC123 4 0
# 5 ABC123 6 1
# 6 ABC123 7 2
# 7 DEF456 3 0
# 8 DEF456 4 0
# 9 DEF456 10 1
#10 DEF456 11 2
#11 DEF456 12 3
#12 DEF456 14 1
#13 GHI789 2 0
#14 GHI789 3 0
#15 JKL012 12 0
#16 JKL012 13 0
#17 JKL012 14 0数据
data <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123",
"ABC123", "ABC123", "DEF456", "DEF456", "DEF456", "DEF456", "DEF456",
"DEF456", "GHI789", "GHI789", "JKL012", "JKL012", "JKL012"),
Month = c(1L, 2L, 3L, 4L, 6L, 7L, 3L, 4L, 10L, 11L, 12L,
14L, 2L, 3L, 12L, 13L, 14L)), class = "data.frame", row.names = c(NA,
-17L))https://stackoverflow.com/questions/59955651
复制相似问题