我在R中有一个数据集,如下所示:
id species date
obs01 FALSE 28/12/2009
obs01 FALSE 14/11/2010
obs01 FALSE 31/12/2010
obs01 TRUE 17/11/2011
obs01 FALSE 10/12/2011
obs01 FALSE 30/12/2011
obs01 FALSE 16/12/2012
obs01 FALSE 17/12/2012
obs01 FALSE 2/11/2013
obs01 FALSE 10/11/2013
obs01 TRUE 11/11/2013
obs01 FALSE 20/11/2013我需要从第一个TRUE开始的数据集作为输出。类似这样的东西(从17/11/2011开始):
id species date
obs01 TRUE 17/11/2011
obs01 FALSE 10/12/2011
obs01 FALSE 30/12/2011
obs01 FALSE 16/12/2012
obs01 FALSE 17/12/2012
obs01 FALSE 2/11/2013
obs01 FALSE 10/11/2013
obs01 TRUE 11/11/2013
obs01 FALSE 20/11/2013你知道怎么做吗?谢谢!
发布于 2019-10-23 01:49:50
一种选择是使用cumsum创建filter
library(dplyr)
df1 %>%
group_by(id) %>%
filter(cumsum(species) >0)
# A tibble: 9 x 3
# Groups: id [1]
# id species date
# <chr> <lgl> <chr>
#1 obs01 TRUE 17/11/2011
#2 obs01 FALSE 10/12/2011
#3 obs01 FALSE 30/12/2011
#4 obs01 FALSE 16/12/2012
#5 obs01 FALSE 17/12/2012
#6 obs01 FALSE 2/11/2013
#7 obs01 FALSE 10/11/2013
#8 obs01 TRUE 11/11/2013
#9 obs01 FALSE 20/11/2013或者,正如@r2evans所提到的,可以使用cumany
df1 %>%
group_by(id) %>%
filter(cumany(species))注意:不清楚原始数据中是否会有多个‘id’,需要进行分组。如果没有,则删除group_by(id)步骤
数据
df1 <- structure(list(id = c("obs01", "obs01", "obs01", "obs01", "obs01",
"obs01", "obs01", "obs01", "obs01", "obs01", "obs01", "obs01"
), species = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE), date = c("28/12/2009", "14/11/2010",
"31/12/2010", "17/11/2011", "10/12/2011", "30/12/2011", "16/12/2012",
"17/12/2012", "2/11/2013", "10/11/2013", "11/11/2013", "20/11/2013"
)), class = "data.frame", row.names = c(NA, -12L))发布于 2019-10-23 02:10:36
您还可以尝试:
df[as.logical(cummax(df$species)), ]
id species date
4 obs01 TRUE 17/11/2011
5 obs01 FALSE 10/12/2011
6 obs01 FALSE 30/12/2011
7 obs01 FALSE 16/12/2012
8 obs01 FALSE 17/12/2012
9 obs01 FALSE 2/11/2013
10 obs01 FALSE 10/11/2013
11 obs01 TRUE 11/11/2013
12 obs01 FALSE 20/11/2013https://stackoverflow.com/questions/58509789
复制相似问题