我是新手,如果能帮上忙我会很感激的。基本上,我想创建一个输出csv文件,其中包含每次爆发的频率和首次爆发日期、最后一次爆发日期和总持续时间。
我有一个如下所示的数据集:
df <- data.frame(outbreak_name = c("A","A","A","A","B","B","C","C","C"), onset = c(as.Date("2021-1-11"), "2021-2-2","2021-2-3","2021-3-3","2021-5-5","2021-7-5","2021-4-5","2021-2-3","2021-12-4"))我已经能够创建具有如下日期的列
summary_ob <- df %>%
group_by(outbreak_name) %>%
mutate(first_onset = min(onset)) %>%
mutate(last_onset = max(onset)) %>%
mutate(duration = last_onset - first_onset) 我可以用一个简单的计数创建一个频率表。
summary_freq <- df %>%
group_by(outbreak_name) %>%
summarize(cases = n())我不明白的是如何组合,所以它会显示爆发A有4个病例,第一次发病是xx,最后一次发病是xx,疫情已经持续了xx天。然后我想把这个作为输出write.csv。
发布于 2021-09-23 01:20:36
library(dplyr)
df %>%
group_by(outbreak_name) %>%
summarize(
cases = n(),
first_onset = min(onset),
last_onset = max(onset)
) %>%
mutate(duration = last_onset - first_onset)
# A tibble: 3 x 5
outbreak_name cases first_onset last_onset duration
<chr> <int> <date> <date> <drtn>
1 A 4 2021-01-11 2021-03-03 51 days
2 B 2 2021-05-05 2021-07-05 61 days
3 C 3 2021-02-03 2021-12-04 304 days之后,您可以使用write_csv导出。
发布于 2021-09-23 01:32:37
我们可以在‘range’的range上用diff来做这件事
library(dplyr)
df %>%
group_by(outbreak_name) %>%
summarise(cases = n(), duration = diff(range(onset)))-output
# A tibble: 3 x 3
outbreak_name cases duration
<chr> <int> <drtn>
1 A 4 51 days
2 B 2 61 days
3 C 3 304 dayshttps://stackoverflow.com/questions/69292909
复制相似问题