我试图编写一个函数,该函数将数字data.frame列中的NA替换为该变量存在的数据的平均值(按组计算)。我意识到这是一种归罪,而且有一些软件包,我更愿意自己来做这件事,而这只是一个例子,它将使用更复杂的功能。我曾试过制作一台mwe,但我被困在了最后。我正在尝试,只要可能的话,坚持使用tidyverse的方法。
library(tidyverse)
## First create a little dataset for a minimum working example for questions
## three vectors
id <- c(rep("boh1", 6), rep("boh2", 6), rep("boh3", 6), rep("boh4", 6))
operator <- rep(c("op1", "op2"), each = 12)
nummos <- c(1, 4, 4, 3, 1, NA, 4, 2, 2, 3, 4, 4, NA, 1, 1, 5,
5, 4, 5, 3, 2, NA, 3, 3)
## combine vectors into df
dat1 <- data.frame(id, operator, nummos)
## group by two variables and get mean of variable by group
dat2 <- dat1 %>%
group_by(id, operator) %>%
summarize(mean = mean(nummos, na.rm=TRUE))
## now stuck, how to replace NA by mean value appropriate for that group?发布于 2019-10-28 14:01:43
使用mutate和dplyr::case_when而不是summarise:
dat1 %>%
group_by(id, operator) %>%
mutate(nummos2 = case_when(is.na(nummos) ~ mean(nummos, na.rm=TRUE),
TRUE ~ as.numeric(nummos)
)
)发布于 2019-10-28 14:11:03
您可以使用replace()简单地定义自己的函数。试用:
dat1 %>%
group_by(id, operator) %>%
mutate_at("nummos", function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))
# output
# A tibble: 24 x 3
# Groups: id, operator [4]
id operator nummos
<fct> <fct> <dbl>
1 boh1 op1 1
2 boh1 op1 4
3 boh1 op1 4
4 boh1 op1 3
5 boh1 op1 1
6 boh1 op1 2.6
7 boh2 op1 4
8 boh2 op1 2
9 boh2 op1 2
10 boh2 op1 3
# ... with 14 more rows发布于 2019-10-28 14:03:02
我对tidyverse并不十分熟悉,所以这里有一个data.table解决方案:
library(data.table) # load package
setDT(dat1) # convert data.frame to data.table现在,我将创建一个data.table,平均值为nummos by c(id, operator),将它与dat1合并为,用计算值填充NA:
dat1[dat1[, mean(nummos, na.rm = TRUE), by = .(id, operator)], nummos := ifelse(is.na(nummos), i.V1, nummos), on = .(id, operator)]dat1[, mean(nummos, na.rm = TRUE), by = .(id, operator)]是一个小的data.table,它通过分组的方式来实现。
nummos := ifelse...部分只在nummos是NA时才执行de赋值。
dat1
id operator nummos
1: boh1 op1 1.0
2: boh1 op1 4.0
3: boh1 op1 4.0
4: boh1 op1 3.0
5: boh1 op1 1.0
6: boh1 op1 2.6
7: boh2 op1 4.0
8: boh2 op1 2.0
9: boh2 op1 2.0
10: boh2 op1 3.0
11: boh2 op1 4.0
12: boh2 op1 4.0
13: boh3 op2 3.2
14: boh3 op2 1.0
15: boh3 op2 1.0
16: boh3 op2 5.0
17: boh3 op2 5.0
18: boh3 op2 4.0
19: boh4 op2 5.0
20: boh4 op2 3.0
21: boh4 op2 2.0
22: boh4 op2 3.2
23: boh4 op2 3.0
24: boh4 op2 3.0
id operator nummoshttps://stackoverflow.com/questions/58592092
复制相似问题