首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >根据列分隔列表的内容创建二进制分类变量

根据列分隔列表的内容创建二进制分类变量
EN

Stack Overflow用户
提问于 2020-04-06 21:02:53
回答 2查看 61关注 0票数 1

我的数据中有一个名为“心脏共病类型”的变量,它包含NAs或列分隔的各种心脏共病类型列表。我如何为每一种可能的共病建立一列,然后用1/0填充观测值,其中1=indicates表示共病和0=no共病。

代码语言:javascript
复制
dput(head(et1$`Cardiac Comorbidity Types`,20))
c("MI,", NA, "CAD, Previous CABG or PTCA, MI, Pacemaker,", "Arrhythmia,", 
"CAD, Previous CABG or PTCA, MI, Arrhythmia,", NA, "CAD, Previous CABG or PTCA, MI,", 
"CAD, Previous CABG or PTCA, CHF, Pacemaker,", "CAD, Previous CABG or PTCA,", 
"CAD, Previous CABG or PTCA, Arrhythmia,", "CAD, Previous CABG or PTCA,", 
"CAD, Previous CABG or PTCA, MI,", "CAD, Previous CABG or PTCA, CHF, Arrhythmia,", 
"CAD, Previous CABG or PTCA, Pacemaker,", "CAD, Previous CABG or PTCA, MI, CHF,", 
"CAD, Previous CABG or PTCA, MI, CHF,", NA, "CAD, Previous CABG or PTCA, PVD, Pacemaker,", 
"PVD,", "CAD, Previous CABG or PTCA,")

此外,如果数据是分号分隔的,我如何做到这一点?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-04-06 21:47:40

我们可以使用unnest和来自tidyrpivot_wider的组合。

代码语言:javascript
复制
library(dplyr)
library(tidyr)
library(stringr)
data <- data %>% mutate(ID = 1:nrow(data))

data %>% 
  mutate(Cardiac.Comorbidity.Types = str_split(Cardiac.Comorbidity.Types, ", ?")) %>%
  unnest(Cardiac.Comorbidity.Types) %>%
  filter(Cardiac.Comorbidity.Types != "") %>%
  pivot_wider(id_cols = "ID", names_from = Cardiac.Comorbidity.Types, values_from = Cardiac.Comorbidity.Types) %>%
  right_join(data, by="ID") %>%
  mutate_at(vars(-ID,-Cardiac.Comorbidity.Types), ~ as.integer(!is.na(.x))) %>% select(-ID)
# A tibble: 20 x 8
#      MI   CAD `Previous CABG or PTCA` Pacemaker Arrhythmia   CHF   PVD Cardiac.Comorbidity.Types                   
#   <int> <int>                   <int>     <int>      <int> <int> <int> <fct>                                       
# 1     1     0                       0         0          0     0     0 MI,                                         
# 2     0     0                       0         0          0     0     0 NA                                          
# 3     1     1                       1         1          0     0     0 CAD, Previous CABG or PTCA, MI, Pacemaker,  
# 4     0     0                       0         0          1     0     0 Arrhythmia,                                 
# 5     1     1                       1         0          1     0     0 CAD, Previous CABG or PTCA, MI, Arrhythmia, 
...

数据

代码语言:javascript
复制
data <- c("MI,", NA, "CAD, Previous CABG or PTCA, MI, Pacemaker,", "Arrhythmia,", 
"CAD, Previous CABG or PTCA, MI, Arrhythmia,", NA, "CAD, Previous CABG or PTCA, MI,", 
"CAD, Previous CABG or PTCA, CHF, Pacemaker,", "CAD, Previous CABG or PTCA,", 
"CAD, Previous CABG or PTCA, Arrhythmia,", "CAD, Previous CABG or PTCA,", 
"CAD, Previous CABG or PTCA, MI,", "CAD, Previous CABG or PTCA, CHF, Arrhythmia,", 
"CAD, Previous CABG or PTCA, Pacemaker,", "CAD, Previous CABG or PTCA, MI, CHF,", 
"CAD, Previous CABG or PTCA, MI, CHF,", NA, "CAD, Previous CABG or PTCA, PVD, Pacemaker,", 
"PVD,", "CAD, Previous CABG or PTCA,")
data <- data.frame(Cardiac.Comorbidity.Types = data)
票数 1
EN

Stack Overflow用户

发布于 2020-04-07 03:44:39

我们可以使用cSplit_esplitstackshape转换为二进制列。

代码语言:javascript
复制
splitstackshape::cSplit_e(et1, "Cardiac.Comorbidity.Types", 
                          type = "character", fill = 0)

默认的sep参数在cSplit_e中是",",如果您有分号分隔的数据,您可以显式地提到这一点。

代码语言:javascript
复制
splitstackshape::cSplit_e(et1, "Cardiac.Comorbidity.Types", sep = ";", 
                          type = "character", fill = 0)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61068924

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档