我有一个数据集,其中有两个变量被二分为“是”/“否”。
> df[1:20,]
# A tibble: 20 × 2
black white
<fct> <fct>
1 No Yes
2 No Yes
3 No Yes
4 No Yes
5 No Yes
6 No Yes
7 No Yes
8 No Yes
9 No Yes
10 No Yes
11 No Yes
12 No Yes
13 No Yes
14 No Yes
15 No Yes
16 Yes No
17 No Yes
18 No Yes
19 No Yes
20 Yes No 这会创建很多变量(我的实际数据中有不止一个选项可供选择),而且看起来不太整洁,因为它意味着许多不必要的变量。我想要创建一个新的变量(例如“种族”),其中现在的单个变量“黑”、“白”等是该变量的级别。如本例所示
> df2[1:20,]
# A tibble: 20 × 1
race
<fct>
1 White
2 White
3 White
4 White
5 White
6 White
7 White
8 White
9 White
10 White
11 White
12 White
13 White
14 White
15 White
16 Black
17 White
18 White
19 White
20 Black我该怎么做?
发布于 2022-04-04 11:03:26
若要考虑多个种族,请在行(MARGIN = 1)上使用apply,并将列名粘贴到带有toString的"Yes"中。
df <- structure(list(asian = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes"), black = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes"), white = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
data.frame(race = apply(df == "Yes", 1, \(x) toString(colnames(df)[which(x)])))
race
1 white
2 white
3 white
4 white
5 white
6 white
7 white
8 white
9 white
10 white
11 white
12 white
13 white
14 white
15 white
16 black
17 white
18 white
19 white
20 asian, black使用max.col (只适用于每个人一个值):
data.frame(race = colnames(df)[max.col(df == "Yes")])发布于 2022-04-04 11:11:16
使用dplyr (这假设在您的数据集中,一个人只能是一个种族):
library(dplyr)
dat <- data.frame(id = 1:2,
black = c("No", "Yes"),
white = c("Yes", "No"))
dat |> mutate(
race = case_when(black == "Yes" ~ "black",
white == "Yes" ~ "white")
)输出:
#> id black white race
#> 1 1 No Yes white
#> 2 2 Yes No blackhttps://stackoverflow.com/questions/71736105
复制相似问题