我有一个包含以下信息的数据集:
我希望通过扫描每个组中的每个健康状况来对数据进行子集,如果每个组的最后一行中的健康状况“不健康”,则将该组的information.so子集为所需的输出:
发布于 2016-07-08 17:52:24
与包,您可以在这里使用dplyr或data.table:
library(dplyr)
DF %>% group_by(group) %>% filter(health[n()] == "N")
group health
(fctr) (fctr)
1 a H
2 a H
3 a N
4 c H
5 c H
6 c N
library(data.table)
setDT(DF)
DF[, if (health[.N] == "N") .SD, by=group]
group health
1: a H
2: a H
3: a N
4: c H
5: c H
6: c N正如@docendodiscimus所指出的,您可以使用last(health)而不是health[n()]或health[.N]。这两个包都有一个last函数来实现这一点。
In base, @docendo提供:
subset(DF, ave(health == "N", group, FUN = function(x) tail(x, 1)))来自@akrun:
subset(DF, group %in% group[health == "N" & !duplicated(group, fromLast=TRUE)])Data.我没有准确地使用OP的数据,因为这是一个痛苦的复制。取而代之的是:
group health
1 a H
2 a H
3 a N
4 b H
5 b H
6 b H
7 c H
8 c H
9 c N
DF = structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("a", "b", "c"), class = "factor"), health = structure(c(1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("H", "N"), class = "factor")), .Names = c("group",
"health"), row.names = c(NA, -9L), class = "data.frame")https://stackoverflow.com/questions/38272608
复制相似问题