我应该在数据库上做一些重新编码。它是一个医疗管理数据库(所以是一个大型数据库)。我应该对编码(ICD-10)的诊断进行重新编码。我把上面数据库上的例子作为一个指示。
ID<-(1:15)
Diag<-c("A001","A002","A003","A004","B001","B002","B003",
"C001","C002","C003","C004","C005","C006","C007","C008")
Age<-round(rnorm(15,25,10))
DATA<-data.frame(ID,Diag,Age)因此,我想:
所有以"A“和"B”开头的"Diag“形式的代码均为”疾病1“。
将从C001到C004的医疗模式编码为“疾病2”。
将从C005到C008的医疗模式编码为“疾病3”。
发布于 2020-04-14 03:28:09
我们可以使用case_when
library(dplyr)
library(stringr)
DATA %>%
mutate(new = case_when(str_sub(Diag, 1, 1) %in% c('A', 'B') ~
'Disease 1',
Diag %in% str_c('C00', 1:4) ~ 'Disease 2',
TRUE ~ 'Disease 3'))
# ID Diag Age new
#1 1 A001 9 Disease 1
#2 2 A002 37 Disease 1
#3 3 A003 27 Disease 1
#4 4 A004 31 Disease 1
#5 5 B001 22 Disease 1
#6 6 B002 23 Disease 1
#7 7 B003 30 Disease 1
#8 8 C001 38 Disease 2
#9 9 C002 24 Disease 2
#10 10 C003 25 Disease 2
#11 11 C004 33 Disease 2
#12 12 C005 26 Disease 3
#13 13 C006 45 Disease 3
#14 14 C007 20 Disease 3
#15 15 C008 22 Disease 3https://stackoverflow.com/questions/61195229
复制相似问题