我是R的新手,所以这可能看起来很简单,但我搞不懂。我的数据看起来像Df,并且需要看起来像Df2:
Df <- data.frame(country = c("A", "A", "A", "A", "A", "B","B", "B", "B"),
year = c("1950", "1951", "1952", "1953", "1954", "1950", "1951", "1952", "1953"),
start_year = c("NA", "1951", "1951", "NA", "1954", "1950", "NA", "1951", "1951"),
end_year= c("NA", "NA", "1952", "NA", "1954", "1950", "NA", "NA", "NA"),
status = c(0, 1, 1, 0, 1, 1, 0, 1, 1),
treatment = c(10, "NA", 20, 5, "NA", "NA", 30, 100, 10))
Df2 <- data.frame(country = c("A", "A", "A", "A", "B","B", "B"),
time1 = c("1950", "1951", "1953", "1954", "1950", "1951", "1952"),
time2 = c("1951", "1953", "1954", "1955", "1951", "1952", "1954"),
status = c(0, 1, 0, 1, 1, 0, 1),
treatment = c(10, 20, 0, "NA", "NA", 30, 110))我们的目标是将其放在一个结构中,以进行PWP循环事件分析。Df2中的处理应该是间隔time1到时间2的处理值的总和。
你知道我怎么才能到那里吗?谢谢!
发布于 2021-11-06 13:34:12
你可以使用
library(dplyr)
Df %>%
mutate(across(where(is.character), ~na_if(.x, "NA")),
time1 = as.numeric(coalesce(start_year, year)),
treatment = as.numeric(treatment)) %>%
group_by(country, time1, status) %>%
summarise(treatment = sum(treatment, na.rm = TRUE), .groups = "drop") %>%
group_by(country) %>%
mutate(time2 = lead(time1, default = last(time1) + 1)) %>%
select(country, time1, time2, status, treatment) %>%
ungroup()要获得
# A tibble: 7 x 5
country time1 time2 status treatment
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 1950 1951 0 10
2 A 1951 1953 1 20
3 A 1953 1954 0 5
4 A 1954 1955 1 0
5 B 1950 1951 1 0
6 B 1951 1951 0 30
7 B 1951 1952 1 110这并不是你想要的输出(请看我的评论),但这是解决你的问题的一个开始。
发布于 2021-11-06 15:37:00
Df2 <- Df %>% mutate(episode = data.table::rleid(status))
library(tidyverse)
Df2 <- Df2 %>%
arrange(country, year) %>%
group_by(country, episode) %>%
mutate(time1 = min(year))
Df2 <- Df2 %>%
arrange(country, year) %>%
group_by(country, episode) %>%
mutate(time2 = (max(as.numeric(year) + 1)))我已经创建了一个剧集标识符,并设法为每个episode标识了time1和time2。现在,我仍然需要组合按episode分组的行,以便每集有一行显示treatment的总和。你知道怎么做吗?
https://stackoverflow.com/questions/69862733
复制相似问题