我有三个变量A,B&C,格式如下
A B C
Cat1 1 NA
Cat1 2 NA
Cat1 1 NA
Cat1 2 NA
Cat1 NA 4
Cat1 NA 1
Cat1 NA 6
Cat1 NA 4
Cat1 7 NA
Cat1 9 NA
Cat1 3 NA
Cat1 2 NA
Cat1 NA 2
Cat1 NA 4
Cat1 NA 5
Cat1 NA 9
. . .
. . .
. . .
. . .让我们假设在变量C中,无论哪里有NA的数值部分,它都应该被称为一个组,我必须在group..Can中找到最大值和最小值的差异,请有人帮忙。
期望产出:
好的。所需的输出类似于:一个Trips值
Cat 1 Trip1 xx (dif of max & min of that trip) 发布于 2018-12-11 11:29:28
使用dplyr和tidyr的解决方案。
library(dplyr)
library(tidyr)
dat2 <- dat %>%
mutate(trip = cumsum(is.na(C))) %>%
drop_na(C) %>%
mutate(trip = group_indices(., trip)) %>%
group_by(trip) %>%
summarize(Diff = max(C) - min(C)) %>%
ungroup()
dat2
# # A tibble: 2 x 2
# trip Diff
# <int> <dbl>
# 1 1 5
# 2 2 7数据
dat <- read.table(text = "A B C
Cat1 1 NA
Cat1 2 NA
Cat1 1 NA
Cat1 2 NA
Cat1 NA 4
Cat1 NA 1
Cat1 NA 6
Cat1 NA 4
Cat1 7 NA
Cat1 9 NA
Cat1 3 NA
Cat1 2 NA
Cat1 NA 2
Cat1 NA 4
Cat1 NA 5
Cat1 NA 9",
header = TRUE, stringsAsFactors = FALSE)发布于 2018-12-11 11:25:43
据我所知,你可以做以下事情
library(data.table)
dt <- fread(text)
dt[, .(C = diff(range(C))), by = .(grp = rleid(is.na(C)))]
# grp C
#1: 1 NA
#2: 2 5
#3: 3 NA
#4: 4 7对于B和C,请同时执行
dt[, lapply(.SD, function(x) diff(range(x))), by = .(grp = rleid(is.na(C))), .SDcols = c('B', 'C')]
# grp B C
#1: 1 1 NA
#2: 2 NA 5
#3: 3 7 NA
#4: 4 NA 7另一个删除NA的选项
cols <- c('B', 'C')
out <- dt[, lapply(.SD, function(x) diff(range(x))), by = rleid(is.na(C)), .SDcols = cols
][, lapply(.SD, na.omit), .SDcols = cols
][, grp := rleid(B)]
out
# B C grp
#1: 1 5 1
#2: 7 7 2注意,第二种和第三种解决方案假设B是NA,而C不是et,反之亦然。
数据
text <- "A B C
Cat1 1 NA
Cat1 2 NA
Cat1 1 NA
Cat1 2 NA
Cat1 NA 4
Cat1 NA 1
Cat1 NA 6
Cat1 NA 4
Cat1 7 NA
Cat1 9 NA
Cat1 3 NA
Cat1 2 NA
Cat1 NA 2
Cat1 NA 4
Cat1 NA 5
Cat1 NA 9"https://stackoverflow.com/questions/53722643
复制相似问题