我有一个函数来缩放/规范化/z-得分转换一些变量使用mutate_at。函数的来源在链接:all.html中。
scale <- function(x, na.rm = FALSE) (x - mean(x, na.rm = na.rm)) / sd(x, na.rm)
如果初始变量中存在NAs,则该函数的使用将导致所有NAs,如下例所述:
#make df1
set.seed(123)
df <- data.frame(
col_A = c(5, NA,2,4, 4,5,8,3,7,9),
col_B = as.numeric(sample(20:90, size = 10)),
col_C = as.numeric(sample(1000:2000, size = 10))
)
df我尝试过将na.rm =TRUE设置为TRUE,这似乎实现了我所追求的目标。
scale_narm_true <- function(x, na.rm = TRUE) (x - mean(x, na.rm = na.rm)) / sd(x, na.rm)
vars <- c("col_A", "col_B")
df_z_score <- df %>%
mutate_at(vars, list(scaled_var = scale)) %>% # introduces NAs in the resulting variables
mutate_at(vars, list(scaled_narm_true_var = scale_narm_true)) # works as expected and desired然而,我真正想要的是在实际的na.rm调用中包含mutate_at = TRUE的选项,如下所示
df_z_score_attempt <- df %>%
mutate_at(vars, list(scaled_var = scale, na.rm=T)) # this doesn't work!任何帮助都将受到感谢,特别是根据all.html的说法,这应该是可能的,说明这是可能的:
starwars %>% mutate_at(c("height", "mass"), scale2, na.rm = TRUE)发布于 2019-08-14 05:19:54
一个选项是使用~指定匿名函数调用,然后将列作为.
library(dplyr)
df %>%
mutate_at(vars, list(scaled_var = ~scale(., na.rm=TRUE)) )
# col_A col_B col_C col_A_scaled_var col_B_scaled_var
#1 5 50 1373 -0.0952381 -0.8939893
#2 NA 70 1664 NA 0.3306536
#3 2 33 1601 -1.3809524 -1.9349357
#4 4 86 1602 -0.5238095 1.3103678
#5 4 61 1767 -0.5238095 -0.2204357
#6 5 69 1708 -0.0952381 0.2694214
#7 8 62 1090 1.1904762 -0.1592036
#8 3 56 1952 -0.9523810 -0.5265964
#9 7 71 1347 0.7619048 0.3918857
#10 9 88 1648 1.6190476 1.4328321如果我们使用默认选项,列将是NA
df %>%
mutate_at(vars, list(scaled_var = scale) )
# col_A col_B col_C col_A_scaled_var col_B_scaled_var
#1 5 50 1373 NA -0.8939893
#2 NA 70 1664 NA 0.3306536
#3 2 33 1601 NA -1.9349357
#4 4 86 1602 NA 1.3103678
#5 4 61 1767 NA -0.2204357
#6 5 69 1708 NA 0.2694214
#7 8 62 1090 NA -0.1592036
#8 3 56 1952 NA -0.5265964
#9 7 71 1347 NA 0.3918857
#10 9 88 1648 NA 1.4328321https://stackoverflow.com/questions/57488513
复制相似问题