我的数据是长格式的id,day和记录的测量。我想要一个新的变量,它在每3天间隔内生成中值(不是滚动,而是第1-3天、4-6天、7-9天等等)。
到目前为止,我使用dplyr作为总体中位数,但不确定如何每隔3天用id编码一次:
test%>%group_by(id)%>%mutate(m=median(o2))以下是一些数据:
structure(list(id = c("1A", "1A", "1A", "1A", "1A", "1A", "1A",
"1A", "1A", "1A"), day = 1:10, o2 = c(40L, 70L, 100L, 100L, 30L,
35L, 30L, 30L, 40L, 40L)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"), spec = structure(list(cols = list(id = structure(list(), class = c("collector_character",
"collector")), day = structure(list(), class = c("collector_integer",
"collector")), o2 = structure(list(), class = c("collector_integer",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector"))), class = "col_spec"))发布于 2019-08-12 11:55:17
组按id和3天间隔,然后计算中位数.
library(dplyr)
test %>%
group_by(id, interval_id = (day-1) %/% 3) %>%
mutate(m = median(o2))
# id day o2 interval_id m
# <chr> <int> <int> <dbl> <int>
# 1A 1 40 0 70
# 1A 2 70 0 70
# 1A 3 100 0 70
# 1A 4 100 1 35
# 1A 5 30 1 35
# 1A 6 35 1 35
# 1A 7 30 2 30
# 1A 8 30 2 30
# 1A 9 40 2 30
# 1A 10 40 3 40发布于 2019-08-12 11:28:35
我们可以使用gl创建为期3天的组,并计算每个组的median。
library(dplyr)
test %>%
group_by(id) %>%
mutate(group = gl(n()/3, 3),
group = cumsum(group != lag(group, default = first(group)))) %>%
group_by(id, group) %>%
summarise(med = median(o2))
# id group med
# <chr> <int> <int>
#1 1A 0 70
#2 1A 1 35
#3 1A 2 30
#4 1A 3 40发布于 2019-08-12 12:00:12
由于这是对data.table::rleid的一个很好的使用,下面是data.table的答案,
library(data.table)
setDT(dd)[, grp := gl(.N, 3, length = .N), by = id][, .(med = median(o2)), .(id, rleid(grp))]
# id rleid med
#1: 1A 1 70
#2: 1A 2 35
#3: 1A 3 30
#4: 1A 4 40https://stackoverflow.com/questions/57460271
复制相似问题