我有一个包含19个变量的数据框架,其中17个变量是因子。其中一些因素包含缺失值,编码为NA。我想对数据帧中的所有因素使用forcats::fct_explicit_na()将未命中重新编码为单独的因素级别"to_impute“。
一个包含两个因子变量的小示例:
df <- structure(list(loc_len = structure(c(NA, NA, NA, NA, NA, NA,
1L, 1L, 3L, 1L), .Label = c("No", "< 5 sec", "5 sec - < 1 min",
"1 - 5 min", "> 5 min", "Unknown duration"), class = "factor"),
AMS = structure(c(1L, 2L, NA, 1L, 1L, NA, NA, NA, NA, NA), .Label = c("No",
"Yes"), class = "factor")), .Names = c("loc_len", "AMS"), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
table(df$loc_len, useNA = "always")
No < 5 sec 5 sec - < 1 min 1 - 5 min > 5 min Unknown duration <NA>
3 0 1 0 0 0 6 下面的代码对两个变量执行此操作。我想对数据框中的所有因子变量'f_names‘执行此操作。有没有办法“向量化”fct_explicit_na()?
f_names <- names(Filter(is.factor, df))
f_names
[1] "loc_len" "AMS"下面的代码做了我想要做的事情,但每个因素都是单独的:
df_new <- df %>%
mutate(loc_len = fct_explicit_na(loc_len, na_level = "to_impute")) %>%
mutate(AMS = fct_explicit_na(AMS, na_level = "to_impute"))我想要数据集中所有因子的这种类型的表,名称是'f_names‘:
lapply(df_new, function(x) data.frame(table(x, useNA = "always")))现在是:
$loc_len
x Freq
1 No 3
2 < 5 sec 0
3 5 sec - < 1 min 1
4 1 - 5 min 0
5 > 5 min 0
6 Unknown duration 0
7 to_impute 6
8 <NA> 0
$AMS
x Freq
1 No 3
2 Yes 1
3 to_impute 6
4 <NA> 0发布于 2018-03-22 03:48:38
更好的是,由提供的优雅和惯用的解决方案:
https://github.com/tidyverse/forcats/issues/122
library(dplyr)
df = df %>% mutate_if(is.factor,
fct_explicit_na,
na_level = "to_impute")发布于 2018-03-20 02:54:20
经过反复试验,下面的代码做了我想要的事情。
library(tidyverse)
df[, f_names] <- lapply(df[, f_names], function(x) fct_explicit_na(x, na_level = "to_impute")) %>% as.data.framehttps://stackoverflow.com/questions/49338118
复制相似问题