我想在dplyr中对某些分组数据进行子集或筛选,以便只包含具有2种不同级别分类数据的组。我的数据如下所示:

我希望我的输出只包括health_facility,它有“疟疾”和“非疟疾”在他们的季节专栏。
我试过了
multi_hf %>%
group_by(health_facility) %>%
filter(season == "malaria" & season == "non-malaria") 然而,我得到的只是NA值。
任何帮助都非常感谢!数据:
structure(list(season = c("malaria", "malaria", "malaria", "malaria",
"malaria", "malaria", "malaria", "malaria", "malaria", "malaria",
"malaria", "malaria", "malaria", "malaria", "malaria", "malaria",
"malaria", "malaria", "malaria", "malaria", "malaria", "malaria",
"malaria", "malaria", "malaria", "malaria", "non-malaria", "non-malaria",
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria",
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria",
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria",
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria",
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria",
"non-malaria", "non-malaria", "non-malaria", "non-malaria"),
health_facility = c("Hospital Agostinho Neto", "Hospital Baptista de Sousa",
"Health Delegation São Miguel", "Health Center Chã de Alecrim",
"Health Center Fonte Inês", "Health Delegation Maio", "Health Delegation Sao Vincente",
"Health Delegation Sao Vincente", "Hospital Ribeira Grande",
"Health Delegation Ribeira Brava", "Health Delegation Santa Cruz",
"Health Delegation Paul", "Center Delegation Santa Catarina",
"Regional Hospital Fogo e Brava", "Health Delegation São Filipe",
"Health Center Cidade Velha", "Health Delegation Tarrafal Santiago",
"Health Delegation Tarrafal Santiago", "Health Delegation Tarrafal Santiago",
"Health Center Sao Salvador do Mundo – Picos", "Health Delegation Tarrafal Santiago",
"Health Delegation São Lourenço dos Orgaos", "Health Delegation Ribeira Grande",
"Health Delegation of Praia", "Center Delegation Santa Catarina",
"Regional Hospital Santiago Norte", "Health Delegation Ribeira Brava",
"Health Delegation Ribeira Brava", "Hospital Baptista de Sousa",
"Health Delegation Paul", "Health Delegation Ribeira Brava",
"Health Center Sao Salvador do Mundo – Picos", "Health Delegation Sao Vincente",
"Health Delegation São Miguel", "Health Delegation Tarrafal Santiago",
"Regional Hospital Santiago Norte", "Regional Hospital Santiago Norte",
"Regional Hospital Santiago Norte", "Regional Hospital Santiago Norte",
"Health Delegation Sao Vincente", "Regional Hospital Fogo e Brava",
"Center Delegation Santa Catarina", "Health Center Chã de Alecrim",
"Hospital Agostinho Neto", "Hospital Ribeira Grande", "Health Delegation São Lourenço dos Orgaos",
"Health Delegation São Lourenço dos Orgaos", "Health Delegation São Filipe",
"Health Center Fonte Inês", "Hospital Agostinho Neto", "Regional Hospital Fogo e Brava",
"Health Delegation of Praia", "Health Delegation Maio", "Health Delegation Ribeira Grande",
"Health Delegation São Lourenço dos Orgaos", "Health Delegation Santa Cruz",
"Health Center Cidade Velha")), class = c("data.table", "data.frame"
), row.names = c(NA, -57L), .internal.selfref = <pointer: 0x0000017c5a4b1ef0>)发布于 2022-05-17 10:46:15
filter(season == "malaria" & season == "non-malaria")意味着选择同时具有“疟疾”和“非疟疾”的行,这是不可能的,因为一行只能有一个值。这就是为什么在示例数据中共享0行的原因。示例数据的输出中没有NA行,但这是因为它在示例数据中不包含任何NA值。如果使用NA,则在与==比较时会返回%in%值,这应该会有所帮助。
因此,您可能希望选择一个health_facility,它的两个值都可以作为-
library(dplyr)
multi_hf %>%
arrange(health_facility) %>%
group_by(health_facility) %>%
filter(all(c("malaria", "non-malaria") %in% season)) %>%
ungroup()发布于 2022-05-17 10:48:17
就我个人而言,我更喜欢一个更清洁的解决方案。在这里使用n_distinct非常合适:
df %>%
group_by(health_facility) %>%
filter(n_distinct(season) == 2) %>%
ungroup()https://stackoverflow.com/questions/72272571
复制相似问题