这是一个有点棘手的解释,所以请容忍我,并提出问题,如果我没有意义。
这是我的数据
mydata <- data.frame(id = c(1,1,1,1,1,1,1,1),
drug = c("let", "per", "pac", "tra","chem", "tem", "cap", "nem"),
type = c("type1", "type2", "type1","type1","type1", "type2", "type1", "type2"),
startdate = c("2016-05-12","2016-05-30","2016-05-31","2016-05-31", "2018-01-18","2018-04-01", "2020-11-05","2020-11-04"),
enddate =c("2016-05-12", "2018-04-05","2017-11-08", "2018-04-05", "2018-01-18", "2020-11-06", "2021-08-18", "2021-08-11"))我的目标是把日期重叠的药物分组。但是,即使两种药物之间的日期有重叠,但是药物的类型切换到type2,我希望这会触发另一行,并有它自己的开始日期和结束日期。
我能够使用以下代码实现日期重叠的分组
mydata <- mydata %>%
arrange(id, startdate,drug) %>%
group_by(id) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) >
cummax(as.numeric(enddate)))[-n()])) %>%
group_by(id, indx) %>%
mutate(drugs = paste0(drug, collapse = ", "))%>%
summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>% distinct()但正如你所看到的,在毒品“让”之后,所有其他行都会被聚在一起。相反,我想要一个新行的"tem“和"nem”,因为它们是第2类药物。
这是我希望得到的输出
mydata1 <- data.frame(id = c(1,1,1,1),
drugs = c("let", "per,pac,tra,chem", "tem", "cap, nem"),
startdate = c("2016-05-12","2016-05-30","2018-04-01","2020-11-04"),
enddate =c("2016-05-12","2018-01-18", "2020-11-06","2021-08-11"))任何帮助都是非常感谢的!
发布于 2022-06-03 23:40:16
我将每种药物的数据分割成不同的数据,然后使用您现有的代码。然后,我将两个新的数据文件重新组合到一个dataframe中。
我也是通过转换从日期得到NA,所以我转换日期使用路标。
#change dates to the Date format
require(lubridate)
mydata$startdate <- as.Date(mydata$startdate)
mydata$enddate <- as.Date(mydata$enddate)
# create two seperate dataframes, one for each drug type
type1 <- mydata %>%
filter(type == "type1")
type2 <- mydata %>%
filter(type == "type2")
#use your code on both the dataframes
type1_grouped <-type1 %>%
arrange(id, startdate,drug) %>%
group_by(id) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) > cummax(as.numeric(enddate)))[-n()])) %>%
group_by(id, indx) %>%
mutate(drugs = paste0(drug, collapse = ", "))%>%
summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>%
distinct()
type2_grouped <- type2 %>%
arrange(id, startdate,drug) %>%
group_by(id) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) > cummax(as.numeric(enddate)))[-n()])) %>%
group_by(id, indx) %>%
mutate(drugs = paste0(drug, collapse = ", "))%>%
summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>%
distinct()
# put the two dataframes back together
mydata2 <- rbind(type1_grouped,type2_grouped)
# Change the format to match mydata1
mydata2 %>% relocate(drugs, .before=startDate) %>% ungroup() %>% select(-indx)https://stackoverflow.com/questions/72494400
复制相似问题