我有一个data.frame,它有100个变量。我只想用mutate (而不是summarise)得到三个变量的之和。
如果这三个变量中有NA,我仍然希望得到sum。为了使用mutate实现这一点,我用0替换了所有的ifelse值,然后得到了sum。
library(dplyr)
df %>% mutate(mod_var1 = ifelse(is.na(var1), 0, var1),
mod_var2 = ifelse(is.na(var2), 0, var2),
mod_var3 = ifelse(is.na(var3), 0, var3),
sum = (mod_var1+mod_var2+mod_var3))有更好的(更短的)方法来做这件事吗?
数据
df <- read.table(text = c("
var1 var2 var3
4 5 NA
2 NA 3
1 2 4
NA 3 5
3 NA 2
1 1 5"), header =T)发布于 2017-01-08 06:19:37
我们可以将Reduce与+结合使用
df %>%
mutate_each(funs(replace(., is.na(.), 0)), var1:var3) %>%
mutate(Sum = Reduce(`+`, .))
# var1 var2 var3 Sum
#1 4 5 0 9
#2 2 0 3 5
#3 1 2 4 7
#4 0 3 5 8
#5 3 0 2 5
#6 1 1 5 7或使用rowSums
df %>%
mutate(Sum = rowSums(.[names(.)[1:3]], na.rm = TRUE))
# var1 var2 var3 Sum
#1 4 5 NA 9
#2 2 NA 3 5
#3 1 2 4 7
#4 NA 3 5 8
#5 3 NA 2 5
#6 1 1 5 7基准测试
set.seed(24)
df1 <- as.data.frame(matrix(sample(c(NA, 1:5), 1e6 *3, replace=TRUE),
dimnames = list(NULL, paste0("var", 1:3)), ncol=3))
system.time({
df1 %>% rowwise() %>% mutate(Sum = sum(c(var1, var2, var3), na.rm = TRUE))
})
# user system elapsed
# 21.50 0.03 21.66
system.time({
df1 %>%
mutate(rn = row_number()) %>%
gather(var, varNum, var1:var3) %>%
group_by(rn) %>%
mutate(sum = sum(varNum, na.rm = TRUE)) %>%
spread(var, varNum)})
# user system elapsed
# 5.96 0.39 6.37
system.time({
replace(df1, is.na(df1), 0) %>% mutate(sum = var1 + var2 + var3)
})
# user system elapsed
# 0.17 0.01 0.19
system.time({
df1 %>%
mutate_each(funs(replace(., is.na(.), 0)), var1:var3) %>%
mutate(Sum = Reduce(`+`, .))
})
# user system elapsed
# 0.10 0.02 0.11
system.time({
df1 %>%
mutate(Sum = rowSums(.[names(.)[1:3]], na.rm = TRUE))
})
# user system elapsed
# 0.04 0.00 0.03 发布于 2017-01-08 06:58:18
rowwise()是我的上进心.它类似于group_by(),但它将每一行作为一个单独的组对待。
df %>% rowwise() %>% mutate(Sum = sum(c(var1, var2, var3), na.rm = TRUE))发布于 2017-01-08 07:11:13
Where = tidyr
df %>%
mutate(rn = row_number()) %>%
gather(var, varNum, var1:var3) %>%
group_by(rn) %>%
mutate(sum = sum(varNum, na.rm = TRUE)) %>%
spread(var, varNum)万一你的数据集准备增长..。
https://stackoverflow.com/questions/41530007
复制相似问题