我在R中有下表,其中列出了一个人的种族、性别、年龄和胆固醇测试。年龄和胆固醇测试显示为虚拟变量。年龄可分为低、中或高,而胆固醇测试可分为低或高。我想把年龄和胆固醇柱转换成单一的列,其中低的分类为1,中等的为2,高的为3。如果一个人从未服用过胆固醇,并且在预期的产出中应该是N/A,那么胆固醇测试可以是低的或高的。我希望解决方案是动态的,这样如果我有这种格式的多列,代码仍然可以工作(例如,可能会有一些新的测试,这些测试可以被归类为高、低或中等的虚拟变量)。
我怎样才能在R中做到这一点?
投入:
race gender age.low_tm1 age.medium_tm1 age.high_tm1 chol_test.low_tm1 chol_test.high_tm1
<chr> <int> <int> <int> <int> <int> <int>
1 white 0 1 0 0 0 0
2 white 0 1 0 0 0 0
3 white 1 1 0 0 0 0
4 black 1 0 1 0 0 0
5 white 0 0 0 1 0 1
6 black 0 0 1 0 1 0预期产出:
race gender age chol_test
1 white 0 1 n/a
2 white 0 1 n/a
3 white 1 1 n/a
4 black 1 2 n/a
5 white 0 3 3
6 black 0 2 1发布于 2022-01-03 17:42:16
也许这能帮上忙
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
mutate(across(contains("_"), ~
. * setNames(1:3, c("low", "medium", "high"))[
str_extract(cur_column(), "low|medium|high")])) %>%
rename_with(~ str_remove(., "_tm1")) %>%
pivot_longer(cols = -c(race, gender),
names_to = c(".value", "categ"), names_sep = "\\.") %>%
filter(age > 0|chol_test > 0) %>%
select(-categ) %>%
mutate(chol_test = na_if(chol_test, 0))-output
# A tibble: 7 × 4
race gender age chol_test
<chr> <int> <int> <int>
1 white 0 1 NA
2 white 0 1 NA
3 white 1 1 NA
4 black 1 2 NA
5 white 0 3 3
6 black 0 0 1
7 black 0 2 NA数据
df1 <- structure(list(race = c("white", "white", "white", "black", "white",
"black"), gender = c(0L, 0L, 1L, 1L, 0L, 0L), age.low_tm1 = c(1L,
1L, 1L, 0L, 0L, 0L), age.medium_tm1 = c(0L, 0L, 0L, 1L, 0L, 1L
), age.high_tm1 = c(0L, 0L, 0L, 0L, 1L, 0L), chol_test.low_tm1 = c(0L,
0L, 0L, 0L, 0L, 1L), chol_test.high_tm1 = c(0L, 0L, 0L, 0L, 1L,
0L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6"))https://stackoverflow.com/questions/70569460
复制相似问题