在这里,我想平均每个治疗的周期1-3和4-6,以及每个变量的id,并在一个新的数据框架中获得数据。有人知道我是怎么做到的吗?
set.seed(1)
id <- rep(1:2,each=6)
trt <- c("A","A","A","B", "B", "B","A","A","A","B", "B", "B")
period <- rep(1:6,2)
pointA <- sample(1:10,12, replace=TRUE)
pointB<- sample(1:10,12, replace=TRUE)
pointC<- sample(1:10,12, replace=TRUE)
df <- data.frame(id,trt,period,pointA, pointB,pointC)
head(df)
id trt period pointA pointB pointC
1 1 A 1 3 7 3
2 1 A 2 4 4 4
3 1 A 3 6 8 1
4 1 B 4 10 5 4
5 1 B 5 3 8 9
6 1 B 6 9 10 4
7 2 A 1 10 4 5
8 2 A 2 7 8 6
9 2 A 3 7 10 5
10 2 B 4 1 3 2
11 2 B 5 3 7 9
12 2 B 6 2 2 7
I would like it to look like this:
id trt Period pointA pointB pointC
1 1 A 123 13 19 8
2 1 B 456 21 23 17
3 2 A 456 24 22 16
4 2 B 123 6 12 18发布于 2020-02-28 16:30:48
使用dplyr,您可以使用适当的组创建一个新变量,然后将其用作group_by。例如
library(dplyr)
df %>%
mutate(period_class = case_when(
period %in% c(1,2,3)~"123",
period %in% c(4,5,6)~"456")
) %>%
select(-period) %>%
group_by(id, trt, period_class) %>%
summarize_all(mean) # though you seem to have used `sum` in your example发布于 2020-02-28 16:41:58
使用data.table。我附加了两个解决方案,一个用于示例(sum),另一个用于请求(mean)。不知道他们为什么在你的问题上有分歧。
码
library(data.table); setDT(df)
point_var = colnames(df) %like% 'point'
# (i) for the sum (as per your example):
dtsum = df[, lapply(.SD, sum), .SDcols = point_var, .(id, trt, pCat = ifelse(period > 3, 456, 123))]
# (ii) for the mean (as per your request)
dtmean = df[, lapply(.SD, mean), .SDcols = point_var, .(id, trt, pCat = ifelse(period > 3, 456, 123))] 输出(i)和
> dtsum
id trt pCat pointA pointB pointC
1: 1 A 123 13 19 8
2: 1 B 456 22 23 17
3: 2 A 123 24 22 16
4: 2 B 456 6 12 18输出(ii)平均
> dtmean
id trt pCat pointA pointB pointC
1: 1 A 123 4.333333 6.333333 2.666667
2: 1 B 456 7.333333 7.666667 5.666667
3: 2 A 123 8.000000 7.333333 5.333333
4: 2 B 456 2.000000 4.000000 6.000000发布于 2020-02-28 16:55:21
您可以使用aggregate和ave在基R中做到这一点。
pstClps <- function(x) paste(x, collapse="") # pre-define FUN
aggregate(. ~ id + trt + period, transform(df, period=ave(period, id, trt, FUN=pstClps)), sum)
# id trt period pointA pointB pointC
# 1 1 A 123 13 19 8
# 2 2 A 123 24 22 16
# 3 1 B 456 22 23 17
# 4 2 B 456 6 12 18数据:
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), trt = c("A", "A", "A", "B", "B", "B", "A", "A", "A",
"B", "B", "B"), period = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L), pointA = c(3L, 4L, 6L, 10L, 3L, 9L, 10L, 7L, 7L,
1L, 3L, 2L), pointB = c(7L, 4L, 8L, 5L, 8L, 10L, 4L, 8L, 10L,
3L, 7L, 2L), pointC = c(3L, 4L, 1L, 4L, 9L, 4L, 5L, 6L, 5L, 2L,
9L, 7L)), row.names = c(NA, -12L), class = "data.frame")https://stackoverflow.com/questions/60455714
复制相似问题