我希望根据指定行的值筛选整个组。
在下面的数据中,我希望根据 for Hour == '2'的值,删除组ID的所有行。(请注意,我是而不是,在这里尝试根据两个条件进行筛选,我尝试基于一个条件进行筛选,但在特定的行上进行筛选)
样本数据:
ID <- c('A','A','A','A','A','B','B','B','B','C','C')
Hour <- c('0','2','5','6','9','0','2','5','6','0','2')
Metric <- c(3,4,1,6,7,8,8,3,6,1,1)
x <- data.frame(ID, Hour, Metric)
ID Hour Metric
1 A 0 3
2 A 2 4
3 A 5 1
4 A 6 6
5 A 9 7
6 B 0 8
7 B 2 8
8 B 5 3
9 B 6 6
10 C 0 1
11 C 2 1我想根据是否用于ID的Metric > 5来过滤每个Hour == '2'。结果应该如下所示(删除ID B的所有行):
ID Hour Metric
1 A 0 3
2 A 2 4
3 A 5 1
4 A 6 6
5 A 9 7
10 C 0 1
11 C 2 1基于dplyr的解决方案将是首选,但任何帮助都是非常感谢的。
发布于 2017-10-26 22:42:54
在(!())中不应该有用。尝尝这个
library(dplyr)
filter(x, Metric > 5 & Hour == '2')$ID # gives B
subset(x, !(ID %in% filter(x, Metric > 5 & Hour == '2')$ID))发布于 2017-10-26 23:51:05
适应How to filter (with dplyr) for all values of a group if variable limit is reached?
我们得到:
x %>%
group_by(ID) %>%
filter(any(Metric[Hour == '2'] <= 5))
# # A tibble: 7 x 3
# # Groups: ID [2]
# ID Hour Metric
# <fctr> <fctr> <dbl>
# 1 A 0 3
# 2 A 2 4
# 3 A 5 1
# 4 A 6 6
# 5 A 9 7
# 6 C 0 1
# 7 C 2 1这些类型的问题也可以通过首先创建by组中间变量来解决,以标记是否应该删除行。
方法1:
x %>%
group_by(ID) %>%
mutate(keep_group = (any(Metric[Hour == '2'] <= 5))) %>%
ungroup %>%
filter(keep_group) %>%
select(-keep_group)方法2:
groups_to_keep <-
x %>%
filter(Hour == '2', Metric <= 5) %>%
select(ID) %>%
distinct() # N.B. this sorts groups_to_keep by ID which may not be desired
# ID
# 1 A
# 2 C
x %>%
inner_join(groups_to_keep, by = 'ID')
# ID Hour Metric
# 1 A 0 3
# 2 A 2 4
# 3 A 5 1
# 4 A 6 6
# 5 A 9 7
# 6 C 0 1
# 7 C 2 1方法3-如@thelatemail所建议的(关于ID中的副本,安全):
groups_not_to_keep <-
x %>%
filter(Hour == 2, Metric > 5) %>%
select(ID)
x %>%
anti_join(groups_not_to_keep, by = 'ID')https://stackoverflow.com/questions/46964794
复制相似问题