我是R的新手,我有一套有疾病状态的专利身份证。我想在疾病的1种状态发生后删除行。我的数据集看起来
ID Date Disease
123 02-03-2012 0
123 03-03-2013 1
123 04-03-2014 0
321 03-03-2015 1
423 06-06-2016 1
423 07-06-2017 1
543 08-05-2018 1
543 09-06-2019 0
645 08-09-2019 0
645 10-10-2018 0
645 11-10 -2012 0预期产出
ID Date Disease
123 02-03-2012 0
123 03-03-2013 1
321 03-03-2015 1
423 06-06-2016 1
543 08-05-2018 1
645 08-09-2019 0
645 10-10-2018 0
645 11-10 -2012 0请建议一个返回预期输出的代码。提前谢谢!
发布于 2020-08-26 08:51:04
使用dplyr的一种方法是,如果在ID中没有出现Disease == 1,则选择所有行,或者只在第1行之前选择行。
library(dplyr)
df %>%
group_by(ID) %>%
filter(if(any(Disease == 1)) row_number() <= match(1, Disease) else TRUE)
# ID Date Disease
# <int> <chr> <int>
#1 123 02-03-2012 0
#2 123 03-03-2013 1
#3 321 03-03-2015 1
#4 423 06-06-2016 1
#5 543 08-05-2018 1
#6 645 08-09-2019 0
#7 645 10-10-2018 0
#8 645 11-10-2012 0数据
df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L,
543L, 645L, 645L, 645L), Date = c("02-03-2012", "03-03-2013",
"04-03-2014", "03-03-2015", "06-06-2016", "07-06-2017", "08-05-2018",
"09-06-2019", "08-09-2019", "10-10-2018", "11-10-2012"), Disease = c(0L,
1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-11L))发布于 2020-08-26 08:57:55
这样就行了。
set.seed(1012)
datas <- data_frame(ids = rep(1:3, each = 3),
times = runif(9, 0, 100),
event = rep(c(0, 1, 0), 3)) %>%
arrange(ids, times)
datas %>%
group_by(ids) %>%
mutate(lag(cumsum(event), default = 0) == 0)发布于 2020-08-26 20:50:07
我们可以使用cumsum创建一个用于子设置的逻辑向量。
library(data.table)
setDT(df)[df[, .I[cumsum(cumsum(Disease)) <= 1], ID]$V1]
# ID Date Disease
#1: 123 02-03-2012 0
#2: 123 03-03-2013 1
#3: 321 03-03-2015 1
#4: 423 06-06-2016 1
#5: 543 08-05-2018 1
#6: 645 08-09-2019 0
#7: 645 10-10-2018 0
#8: 645 11-10-2012 0或者使用dplyr
library(dplyr)
df %>%
group_by(ID) %>%
filter(cumsum(cumsum(Disease)) <=1)数据
df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L,
543L, 645L, 645L, 645L), Date = c("02-03-2012", "03-03-2013",
"04-03-2014", "03-03-2015", "06-06-2016", "07-06-2017", "08-05-2018",
"09-06-2019", "08-09-2019", "10-10-2018", "11-10-2012"), Disease = c(0L,
1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L)), class = "data.frame",
row.names = c(NA,
-11L))https://stackoverflow.com/questions/63593926
复制相似问题