我看过几个地方,但我就是想不出该怎么做。它看起来已经改变了几次,所以更令人困惑
我想将内窥镜专家的NumOfBx总结为函数的一部分。我有以下数据帧
vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ",
"John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ",
"Mar Gret ", "Phil Ip ", "Phil Ip "), NumbOfBx = c(2, 4, NA,
2, 12, 12, NA, NA, NA, 3, NA)), row.names = 100:110, .Names = c("Endoscopist",
"NumbOfBx"), class = "data.frame")我的功能是:
NumBx <- function(x, y, z) {
x <- data.frame(x)
x <- x[!is.na(x[,y]), ]
NumBxPlot <- x %>% group_by_(z) %>% summarise(avg = mean(y, na.rm = T))
}我用以下方式调用它:
NumBx(vv,"Endoscopist","NumOfBx)这给了我一个错误:
Warning messages:
1: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA我将函数更改为使用summarise_
但我得到了同样的东西。然后我意识到特别需要summarise_ (而不是group_by_)需要一个标准的评估,我尝试了这个(来自this stackoverflow example)
library(lazyeval)
NumBx <- function(x, y, z) {
x <- data.frame(x)
x <- x[!is.na(x[,y]), ]
NumBxPlot <- x %>% group_by_(z) %>%
summarise_(sum_val = interp(~mean(y, na.rm = TRUE), var = as.name(y)))但我仍然得到相同的错误:
Warning messages:
1: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA我的预期输出是:
Endoscopist Avg
Jupi Ter 4
John Boy 28
Phil Ip 3发布于 2017-08-29 23:40:53
使用rlang (lazyeval的替代品),你可以这样做
library(dplyr)
vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ", "John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ", "Mar Gret ", "Phil Ip ", "Phil Ip "),
NumbOfBx = c(2, 4, NA, 2, 12, 12, NA, NA, NA, 3, NA)),
row.names = 100:110, .Names = c("Endoscopist", "NumbOfBx"), class = "data.frame")
num_bx <- function(.data, group, variable) {
group <- enquo(group)
variable <- enquo(variable)
.data %>%
tidyr::drop_na(!!variable) %>%
group_by(!!group) %>%
summarise(avg = mean(!!variable))
}
vv %>% num_bx(Endoscopist, NumbOfBx)
#> # A tibble: 3 x 2
#> Endoscopist avg
#> <chr> <dbl>
#> 1 John Boy 7
#> 2 Jupi Ter 4
#> 3 Phil Ip 3或者,如果您想将其保留为字符串而不是未加引号的名称,
num_bx <- function(.data, group, variable) {
group <- rlang::sym(group)
variable <- rlang::sym(variable)
.data %>%
tidyr::drop_na(!!variable) %>%
group_by(!!group) %>%
summarise(avg = mean(!!variable))
}
vv %>% num_bx("Endoscopist", "NumbOfBx")
#> # A tibble: 3 x 2
#> Endoscopist avg
#> <chr> <dbl>
#> 1 John Boy 7
#> 2 Jupi Ter 4
#> 3 Phil Ip 3发布于 2017-08-29 23:40:40
按照dplyr programming vignette定义函数,如下所示:
NumBx <- function( x, y, z )
{
yy <- enquo( y )
zz <- enquo( z )
data.frame(x) %>% filter( !is.na(!!yy) ) %>% group_by( !!zz ) %>%
summarize( avg = mean(!!yy) )
}您现在可以这样调用它:
NumBx( vv, NumbOfBx, Endoscopist )
# Endoscopist avg
# <chr> <dbl>
# 1 John Boy 7
# 2 Jupi Ter 4
# 3 Phil Ip 3一些注意事项:
z进行分组,但由于z argument.na.rm=TRUE是冗余的,所以您传递的是NumbOfBx。您已经过滤掉了行,其中y变量是NA。John Boy的平均值应该是7,而不是28 (在您的预期输出中声明的值)。https://stackoverflow.com/questions/45942801
复制相似问题