我正在寻找在R中跨列计算“YES”的帮助--最好是寻找一个“整洁”的解决方案。
我有一个数据集df_help,需要创建一个新的变量,它根据对象dim_1求值并计算匹配的总数,在df_help_reprex中表示为dim_1。
是否有dplyr解决方案,或者使用apply函数作为函数是更好的方法?
谢谢!
> df_help_reprex <- df_help %>%
+ mutate(dim_1 = c(1, 0, 2, 0, 0, 0, 0, 1, 2, 0))
> df_help
# A tibble: 10 x 8
symp_ams symp_nvd symp_pain symp_fever vitals_gcs vitals_rr_10_24 vitals_temp_38 vitals_hr_100
<fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
1 NO YES NO NO NO NO NO YES
2 NO NO NO NO NO NO NO NO
3 YES NO NO NO YES NO UNK YES
4 NO NO NO NO NO NO UNK YES
5 NO NO NO YES YES NO YES NO
6 NO NO NO NO NO NO NO NO
7 NO NO NO YES NO NO NO NO
8 NO YES NO NO NO NO NO NO
9 YES NO NO NO YES NO NO YES
10 NO NO NO YES NO YES YES YES
> dim_1
[1] "symp_ams" "symp_nvd" "symp_pain" "vitals_gcs"
> df_help_reprex
# A tibble: 10 x 9
symp_ams symp_nvd symp_pain symp_fever vitals_gcs vitals_rr_10_24 vitals_temp_38 vitals_hr_100 dim_1
<fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <dbl>
1 NO YES NO NO NO NO NO YES 1
2 NO NO NO NO NO NO NO NO 0
3 YES NO NO NO YES NO UNK YES 2
4 NO NO NO NO NO NO UNK YES 0
5 NO NO NO YES YES NO YES NO 0
6 NO NO NO NO NO NO NO NO 0
7 NO NO NO YES NO NO NO NO 0
8 NO YES NO NO NO NO NO NO 1
9 YES NO NO NO YES NO NO YES 2
10 NO NO NO YES NO YES YES YES 0发布于 2020-08-29 07:42:37
我建议使用tidyverse方法重塑数据,然后计算匹配值的数量。代码如下:
library(tidyverse)
#Data
df_help <- structure(list(symp_ams = c("NO", "NO", "YES", "NO", "NO", "NO",
"NO", "NO", "YES", "NO"), symp_nvd = c("YES", "NO", "NO", "NO",
"NO", "NO", "NO", "YES", "NO", "NO"), symp_pain = c("NO", "NO",
"NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO"), symp_fever = c("NO",
"NO", "NO", "NO", "YES", "NO", "YES", "NO", "NO", "YES"), vitals_gcs = c("NO",
"NO", "YES", "NO", "YES", "NO", "NO", "NO", "YES", "NO"), vitals_rr_10_24 = c("NO",
"NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", "YES"), vitals_temp_38 = c("NO",
"NO", "UNK", "UNK", "YES", "NO", "NO", "NO", "NO", "YES"), vitals_hr_100 = c("YES",
"NO", "YES", "YES", "NO", "NO", "NO", "NO", "YES", "YES")), row.names = c(NA,
-10L), class = "data.frame")
#Vector for match
dim_1 <- c("symp_ams","symp_nvd","symp_pain","vitals_gcs")接下来是使用tidyverse函数的解决方案。我们重塑数据,处理每一行计算一个id。之后,我们检查条件,聚合值,最后将结果绑定到初始数据帧:
#Reshape
df_help %>% bind_cols(df_help %>% mutate(id=1:n()) %>%
pivot_longer(cols = -id) %>%
mutate(Num=ifelse(name %in% dim_1 & value=='YES',1,0)) %>%
group_by(id) %>% summarise(Dim1=sum(Num)) %>% select(-id))输出:
symp_ams symp_nvd symp_pain symp_fever vitals_gcs vitals_rr_10_24 vitals_temp_38 vitals_hr_100 Dim1
1 NO YES NO NO NO NO NO YES 1
2 NO NO NO NO NO NO NO NO 0
3 YES NO NO NO YES NO UNK YES 2
4 NO NO NO NO NO NO UNK YES 0
5 NO NO NO YES YES NO YES NO 1
6 NO NO NO NO NO NO NO NO 0
7 NO NO NO YES NO NO NO NO 0
8 NO YES NO NO NO NO NO NO 1
9 YES NO NO NO YES NO NO YES 2
10 NO NO NO YES NO YES YES YES 0顺便提一下,在您的最终输出中,第5行应该有一个拼写错误,因为vitals_gcs列被定义为YES并与向量dim_1匹配。
https://stackoverflow.com/questions/63641423
复制相似问题