我正在尝试编写一个代码来比较同一数据帧中的两个列,使用摘要来创建一个新列,该列将声明ID是否在审查发生之前注册。
这是我的数据框架:
tt <- structure(list(ID = c("P40", "P40", "P40", "P42", "P42", "P43", "P43",
"P44", "P44"),Type = c("Pre-Initial", "Review", "Review", "Initial", "Review", "Initial", "Review", "Pre-Initial", "Review"),
Registered = c("Yes", "", "", "No", "", "Yes", "", "No", "")),
class = "data.frame", row.names = c(NA, -9L))我想要实现的结果:
ID Outcome
P40 Yes
P42 No
P43 Yes
P44 No这是我尝试过的代码,但对所有ID只显示No
tt %>% group_by(ID) %>%
summarise(outcome = c("No", "Yes")[all(Registered == "Yes" & Type == "Review") + 1])发布于 2019-07-16 18:20:56
可以尝试:
tt %>%
group_by(ID) %>%
summarise(
Outcome = c("No", "Yes")[any(Type == "Review" & cumsum(Registered == "Yes") == 1) + 1]
) 输出:
# A tibble: 4 x 2
ID Outcome
<chr> <chr>
1 P40 Yes
2 P42 No
3 P43 Yes
4 P44 No 请注意,这假设Registered的Yes在每个ID中只出现一次。否则,只需用cumsum(Registered == "Yes") >= 1替换cumsum(Registered == "Yes") == 1即可。
发布于 2019-07-16 18:24:01
另一个dplyr变体,在这里,如果Registered中没有值作为"Yes",则返回"No",或者将它与"Review"的出现索引进行比较,并相应地赋值。
library(dplyr)
tt %>%
group_by(ID) %>%
summarise(Outcome = if (any(Registered == "Yes"))
c("No", "Yes")[(which.max(Registered == "Yes") <
which.max(Type == "Review"))+1] else "No")
# ID Outcome
# <chr> <chr>
#1 P40 Yes
#2 P42 No
#3 P43 Yes
#4 P44 No 发布于 2019-07-16 18:21:25
我不能确切地确定您的预期结果是什么,但从您的描述中,听起来Type == 'Review'行似乎完全不相关:您只需删除它们,然后删除该列(并重命名Registered列):
tt %>%
filter(Type != 'Review') %>%
select(- Type, Outcome = Registered)https://stackoverflow.com/questions/57054990
复制相似问题