我想在数据帧中添加一个新变量,在不同的组中,具有不同的条件。我的数据如下:
test <- data.frame(country =rep( letters[1:5], each = 10),
time = seq(from = as.Date('2020-01-01'), to = as.Date('2020-02-19'), by = 'day')) %>% mutate(time = as.Date(time))
lockdown_time <- data.frame(country = letters[1:4],
start_time = c('2020-01-06', '2020-01-16', '2020-01-26', '2020-02-05'),
end_time = c('2020-01-08','2020-01-18','2020-01-28','2020-02-07')) 我会以country == 'a'为例:
# use country a as an example
test_a <- test %>% filter(country == 'a')
start_time_a <- lockdown_time[1,2] %>% as.Date()
end_time_a <- lockdown_time[1,3] %>% as.Date()
test_a %>% mutate(lockdown = case_when(between(time, start_time_a, end_time_a) ~ 1, T ~ 0))我知道如何在每个国家一个一个地添加新变量lockdown,但我不知道是否有一种有效的方法来做到这一点。注意,在country == 'e' dataframe中没有country == 'e',所以在country == 'e'中创建的lockdown变量应该都是NA。
发布于 2020-06-12 19:03:38
您可以使用>=和<=来识别日期是否在指定的范围内。
library(dplyr)
test %>%
left_join(lockdown_time, by = "country") %>%
mutate(start_time = as.Date(start_time), end_time = as.Date(end_time),
lockdown = + (time >= start_time & time <= end_time)) %>%
select(-ends_with("_time"))或者将between()与rowwise()结合使用
test %>%
left_join(lockdown_time, by = "country") %>%
mutate(start_time = as.Date(start_time), end_time = as.Date(end_time)) %>%
rowwise() %>%
mutate(lockdown = + between(time, start_time, end_time)) %>%
select(-ends_with("_time")) %>%
ungroup()输出
# A tibble: 50 x 3
country time lockdown
<chr> <date> <int>
1 a 2020-01-01 0
2 a 2020-01-02 0
3 a 2020-01-03 0
4 a 2020-01-04 0
5 a 2020-01-05 0
6 a 2020-01-06 1
7 a 2020-01-07 1
8 a 2020-01-08 1
9 a 2020-01-09 0
10 a 2020-01-10 0
11 b 2020-01-11 0
12 b 2020-01-12 0
13 b 2020-01-13 0
14 b 2020-01-14 0
15 b 2020-01-15 0
16 b 2020-01-16 1
17 b 2020-01-17 1
18 b 2020-01-18 1
19 b 2020-01-19 0
20 b 2020-01-20 0
⠇
46 e 2020-02-15 NA
47 e 2020-02-16 NA
48 e 2020-02-17 NA
49 e 2020-02-18 NA
50 e 2020-02-19 NA发布于 2020-06-12 17:58:59
您需要一个left_join,而且我正在使用lubridate包来方便在日期之间进行测试。
library(tidyverse)
library(lubridate)
test <- data.frame(
country =rep( letters[1:5], each = 10),
time = seq(from = as.Date('2020-01-01'), to = as.Date('2020-02-19'), by = 'day'),
stringsAsFactors = F
) %>%
mutate(time = lubridate::as_date(time))
lockdown_time <- data.frame(
country = letters[1:4],
start_time = c('2020-01-06', '2020-01-16', '2020-01-26', '2020-02-05'),
end_time = c('2020-01-08','2020-01-18','2020-01-28','2020-02-07'),
stringsAsFactors = F
) %>%
mutate(
start_time = as_date(start_time),
end_time = as_date(end_time))
test %>%
left_join(lockdown_time) %>%
mutate(lockdown = as.integer(time %within% interval(start_time, end_time)))https://stackoverflow.com/questions/62349704
复制相似问题