我有两个时间序列数据。一份包括药品名称、开始使用日期、停止使用日期、药物剂量和第二次数据,包括访问日期和评分,
data1<- data.frame("Drug Name" = c("Drug1","Drug1","Drug2","Drug1","Drug3","Drug2",
"Drug4","Drug5","Drug1"),
"Start Date" = c("7/1/2016","1/1/2016", "8/6/2015","2/1/2015","6/14/2017",
"6/21/2017","1/24/2018","7/30/2018","7/30/2018"),
"Stop Date "=c("1/14/2017","1/14/2017", "1/14/2017","1/14/2017"
,"1/24/2018","6/29/2018","6/29/2018","Ongoing","Ongoing"),
"Dose"=c(12,20,32,3,5,6,6,8,9))
data2<-data.frame("visitdate"=c("8/24/2016","8/24/2016", "10/19/2016","12/7/2016","12/21/2016",
"3/22/2017","5/10/2017", "6/14/2017", "7/12/2017","9/27/2017",
"11/29/2017", "1/24/2018","3/21/2018","5/30/2018","8/15/2018",
"10/3/2018", "11/28/2018"),
"Score"=c(1,2,3,34,6,7,9,5,6,8,5,5,7,9,1,2,5))我想合并这两个数据的方式,例如,在访问日期,8/24/2016告诉我,病人服用了多少药物,其剂量加上临床评分。
发布于 2020-11-17 00:04:08
可能会有一些数据的预处理,以便尽早考虑。
首先,上面的例子有带有空格的列名,最好避免。我为这个例子编辑并删除了空格。
此外,您还有“正在进行的”约会。建议使用as.Date将日期转换为日期。但是,在转换那些具有“正在进行”的内容后,将被包含为NA。这些可以设置为Inf (无穷大),这将工作。
例如:
data1$StartDate <- as.Date(data1$StartDate, format = "%m/%d/%Y")
data1$StopDate <- as.Date(data1$StopDate, format = "%m/%d/%Y")
data2$VisitDate <- as.Date(data2$VisitDate, format = "%m/%d/%Y")
data1$StopDate[8:9] <- Inf根据数据源的不同,有许多其他方法可以解决这一问题。
之后,您可以使用tidyverse和fuzzyjoin执行以下操作。使用fuzzy_left_join,您可以将两个数据框架连接在一起,其中仅包括日期范围内的日期。
您可以考虑将结果保持较长的形式。但是,如果需要宽格式,则可以使用pivot_wider。最后一个select按数字顺序排列列,如您的示例所示。
library(tidyverse)
library(fuzzyjoin)
fuzzy_left_join(data2,
data1,
by = c("VisitDate" = "StartDate",
"VisitDate" = "StopDate"),
match_fun = list(`>=`, `<=`)) %>%
select(-StartDate, -StopDate) %>%
group_by(VisitDate, Score) %>%
mutate(rn = row_number(),
NumDrugs = ifelse(all(is.na(DrugName)) == TRUE, 0, n())) %>%
pivot_wider(id_cols = c(VisitDate, Score, NumDrugs), names_from = rn, values_from = c(DrugName, Dose)) %>%
select(VisitDate, Score, NumDrugs, names(.)[-c(1:3)][order(parse_number(names(.)[-c(1:3)]))])输出
VisitDate Score NumDrugs DrugName_1 Dose_1 DrugName_2 Dose_2 DrugName_3 Dose_3 DrugName_4 Dose_4
<date> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 2016-08-24 1 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
2 2016-08-24 2 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
3 2016-10-19 3 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
4 2016-12-07 34 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
5 2016-12-21 6 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
6 2017-03-22 7 0 NA NA NA NA NA NA NA NA
7 2017-05-10 9 0 NA NA NA NA NA NA NA NA
8 2017-06-14 5 1 Drug3 5 NA NA NA NA NA NA
9 2017-07-12 6 2 Drug3 5 Drug2 6 NA NA NA NA
10 2017-09-27 8 2 Drug3 5 Drug2 6 NA NA NA NA
11 2017-11-29 5 2 Drug3 5 Drug2 6 NA NA NA NA
12 2018-01-24 5 3 Drug3 5 Drug2 6 Drug4 6 NA NA
13 2018-03-21 7 2 Drug2 6 Drug4 6 NA NA NA NA
14 2018-05-30 9 2 Drug2 6 Drug4 6 NA NA NA NA
15 2018-08-15 1 2 Drug5 8 Drug1 9 NA NA NA NA
16 2018-10-03 2 2 Drug5 8 Drug1 9 NA NA NA NA
17 2018-11-28 5 2 Drug5 8 Drug1 9 NA NA NA NA数据
(在转换日期之前)
data1 <- structure(list(DrugName = c("Drug1", "Drug1", "Drug2", "Drug1",
"Drug3", "Drug2", "Drug4", "Drug5", "Drug1"), StartDate = c("7/1/2016",
"1/1/2016", "8/6/2015", "2/1/2015", "6/14/2017", "6/21/2017",
"1/24/2018", "7/30/2018", "7/30/2018"), StopDate = c("1/14/2017",
"1/14/2017", "1/14/2017", "1/14/2017", "1/24/2018", "6/29/2018",
"6/29/2018", NA, NA), Dose = c(12, 20, 32, 3, 5, 6, 6, 8, 9)), class = "data.frame", row.names = c(NA,
-9L))
data2 <- structure(list(VisitDate = c("8/24/2016", "8/24/2016", "10/19/2016",
"12/7/2016", "12/21/2016", "3/22/2017", "5/10/2017", "6/14/2017",
"7/12/2017", "9/27/2017", "11/29/2017", "1/24/2018", "3/21/2018",
"5/30/2018", "8/15/2018", "10/3/2018", "11/28/2018"), Score = c(1,
2, 3, 34, 6, 7, 9, 5, 6, 8, 5, 5, 7, 9, 1, 2, 5)), class = "data.frame", row.names = c(NA,
-17L))https://stackoverflow.com/questions/64865407
复制相似问题