我想在dplyr中创建一个groupwise summarise()操作,但是如果满足边缘情况,则需要使用不同的函数。
我有像这样的计数数据。浓度和标准差的计算如下:
library(dplyr)
testdata <- data_frame(sample = sort(rep(1:3, 4)),
volume = rep(c(1e-1, 1e-1, 1e-2, 1e-2), 3),
count = c(400, 400, 40, 40, 0, 0, 0, 0, 400, 400, 400, 400))
testdata %>%
group_by(sample) %>%
summarise(concentration = sum(count) / sum(volume),
sd = sqrt(sum(count)))然而,当进行计算时,只计算值在25-250之间。我可以用以下方法实现:
testdata %>%
group_by(sample) %>%
filter((count >= 25) & (count <= 250)) %>%
summarise(concentration = sum(count) / sum(volume),
sd = sqrt(sum(count)))但样品2和3没有浓度。
每个组的边缘情况可以用如下所示的方法计算:
if (all(count <= 25)){
summarise(concentration = 25 / min(volume),
sd = NA)
}
else if (all(count >= 250)){
summarise(concentration = 250 / max(volume),
sd = NA)
}这样的边缘情况能集成到summarise()函数中吗?
理想情况下,我还希望有一个标志来指示一个边缘大小写,它返回除了返回的边缘情况之外的所有情况的结果= "OK“:
if (all(count <= 25)){
summarise(concentration = 25 / min(volume),
sd = NA,
result = "LOW")
}
else if (all(count >= 250)){
summarise(concentration = 250 / max(volume),
sd = NA,
result = "HIGH")
}发布于 2016-09-19 22:03:00
一种方法是在summarise中使用ifelse对逻辑进行编码。
library(dplyr)
result <- testdata %>% group_by(sample) %>%
summarise(concentration = ifelse(all(count <= 25),
25 / min(volume),
ifelse(all(count >= 250),
250 / max(volume),
sum(count) / sum(volume))),
sd = ifelse(all(count <= 25),
NA,
ifelse(all(count >= 250),
NA,
sqrt(sum(count)))),
result = ifelse(all(count <= 25),
"LOW",
ifelse(all(count >= 250),
"HIGH",
"OK")))
print(result)
### A tibble: 3 x 4
## sample concentration sd result
## <int> <dbl> <dbl> <chr>
##1 1 4000 29.66479 OK
##2 2 2500 NA LOW
##3 3 2500 NA HIGH更新方法
另一种方法(希望更接近OP的要求)是定义一个函数:
summarise.func <- function(count, volume) {
if (all(count <= 25)) {
concentration <- 25 / min(volume)
sd <- NA
result <- "LOW"
} else if (all(count >= 250)) {
concentration <- 250 / max(volume)
sd <- NA
result <- "HIGH"
} else {
concentration <- sum(count) / sum(volume)
sd <- sqrt(sum(count))
result <- "OK"
}
data.frame(concentration=concentration, sd=sd, result=result, stringsAsFactors=FALSE)
}它同时处理常规情况和边缘情况。关键是该函数返回一个包含总结结果的data.frame。然后,summarise将创建一个列,该列包含这些数据帧,然后可以进行tidyr::unnest编辑:
library(dplyr)
library(tidyr)
result <- testdata %>% group_by(sample) %>%
summarise(csr=list(f(count, volume))) %>%
unnest(csr)
print(result)
### A tibble: 3 x 4
## sample concentration sd result
## <int> <dbl> <dbl> <chr>
##1 1 4000 29.66479 OK
##2 2 2500 NA LOW
##3 3 2500 NA HIGHhttps://stackoverflow.com/questions/39581671
复制相似问题