我的数据中有一个名为“心脏共病类型”的变量,它包含NAs或列分隔的各种心脏共病类型列表。我如何为每一种可能的共病建立一列,然后用1/0填充观测值,其中1=indicates表示共病和0=no共病。
dput(head(et1$`Cardiac Comorbidity Types`,20))
c("MI,", NA, "CAD, Previous CABG or PTCA, MI, Pacemaker,", "Arrhythmia,",
"CAD, Previous CABG or PTCA, MI, Arrhythmia,", NA, "CAD, Previous CABG or PTCA, MI,",
"CAD, Previous CABG or PTCA, CHF, Pacemaker,", "CAD, Previous CABG or PTCA,",
"CAD, Previous CABG or PTCA, Arrhythmia,", "CAD, Previous CABG or PTCA,",
"CAD, Previous CABG or PTCA, MI,", "CAD, Previous CABG or PTCA, CHF, Arrhythmia,",
"CAD, Previous CABG or PTCA, Pacemaker,", "CAD, Previous CABG or PTCA, MI, CHF,",
"CAD, Previous CABG or PTCA, MI, CHF,", NA, "CAD, Previous CABG or PTCA, PVD, Pacemaker,",
"PVD,", "CAD, Previous CABG or PTCA,")此外,如果数据是分号分隔的,我如何做到这一点?
发布于 2020-04-06 21:47:40
我们可以使用unnest和来自tidyr的pivot_wider的组合。
library(dplyr)
library(tidyr)
library(stringr)
data <- data %>% mutate(ID = 1:nrow(data))
data %>%
mutate(Cardiac.Comorbidity.Types = str_split(Cardiac.Comorbidity.Types, ", ?")) %>%
unnest(Cardiac.Comorbidity.Types) %>%
filter(Cardiac.Comorbidity.Types != "") %>%
pivot_wider(id_cols = "ID", names_from = Cardiac.Comorbidity.Types, values_from = Cardiac.Comorbidity.Types) %>%
right_join(data, by="ID") %>%
mutate_at(vars(-ID,-Cardiac.Comorbidity.Types), ~ as.integer(!is.na(.x))) %>% select(-ID)
# A tibble: 20 x 8
# MI CAD `Previous CABG or PTCA` Pacemaker Arrhythmia CHF PVD Cardiac.Comorbidity.Types
# <int> <int> <int> <int> <int> <int> <int> <fct>
# 1 1 0 0 0 0 0 0 MI,
# 2 0 0 0 0 0 0 0 NA
# 3 1 1 1 1 0 0 0 CAD, Previous CABG or PTCA, MI, Pacemaker,
# 4 0 0 0 0 1 0 0 Arrhythmia,
# 5 1 1 1 0 1 0 0 CAD, Previous CABG or PTCA, MI, Arrhythmia,
...数据
data <- c("MI,", NA, "CAD, Previous CABG or PTCA, MI, Pacemaker,", "Arrhythmia,",
"CAD, Previous CABG or PTCA, MI, Arrhythmia,", NA, "CAD, Previous CABG or PTCA, MI,",
"CAD, Previous CABG or PTCA, CHF, Pacemaker,", "CAD, Previous CABG or PTCA,",
"CAD, Previous CABG or PTCA, Arrhythmia,", "CAD, Previous CABG or PTCA,",
"CAD, Previous CABG or PTCA, MI,", "CAD, Previous CABG or PTCA, CHF, Arrhythmia,",
"CAD, Previous CABG or PTCA, Pacemaker,", "CAD, Previous CABG or PTCA, MI, CHF,",
"CAD, Previous CABG or PTCA, MI, CHF,", NA, "CAD, Previous CABG or PTCA, PVD, Pacemaker,",
"PVD,", "CAD, Previous CABG or PTCA,")
data <- data.frame(Cardiac.Comorbidity.Types = data)发布于 2020-04-07 03:44:39
我们可以使用cSplit_e从splitstackshape转换为二进制列。
splitstackshape::cSplit_e(et1, "Cardiac.Comorbidity.Types",
type = "character", fill = 0)默认的sep参数在cSplit_e中是",",如果您有分号分隔的数据,您可以显式地提到这一点。
splitstackshape::cSplit_e(et1, "Cardiac.Comorbidity.Types", sep = ";",
type = "character", fill = 0)https://stackoverflow.com/questions/61068924
复制相似问题