我需要在每一行中聚合以前的5年的N_C变量。
例如:2017年- Sum_Five_Years = 10( 2017 )+21(2015)+14(2014)+16(2013) = 61
数据:
library(dplyr)
DF<-data.frame(company = c("DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM"),
year= c("2017","2015","2015","2015","2013","2012","2012","2012","2010","2010","2015","2014","2014","2013","2013","2012"),
N_C= c("0","7","5","4","3","24","52","99","43","37","5","7","7","4","9","20"), Sum_Year = c("0","21","21","21","16","195","195","195","80","80","21","14","14","16","16","195"))
DF <- DF %>% arrange(year)
company year N_C Sum_Year
1 DEL MAR PHARM 2010 43 80
2 DEL MAR PHARM 2010 37 80
3 DEL MAR PHARM 2012 24 195
4 DEL MAR PHARM 2012 52 195
5 DEL MAR PHARM 2012 99 195
6 DEL MAR PHARM 2012 20 195
7 DEL MAR PHARM 2013 3 16
8 DEL MAR PHARM 2013 4 16
9 DEL MAR PHARM 2013 9 16
10 DEL MAR PHARM 2014 7 14
11 DEL MAR PHARM 2014 7 14
12 DEL MAR PHARM 2015 7 21
13 DEL MAR PHARM 2015 5 21
14 DEL MAR PHARM 2015 4 21
15 DEL MAR PHARM 2015 5 21
16 DEL MAR PHARM 2017 10 10预期结果
DF$Sum_Five_Year <- cbind(c("80","80","275","275","275","275","291","291","291","305","305","246","246","246","246","61"))
> DF
company year N_C Sum_Year Sum_Five_Year
1 DEL MAR PHARM 2010 43 80 80
2 DEL MAR PHARM 2010 37 80 80
3 DEL MAR PHARM 2012 24 195 275
4 DEL MAR PHARM 2012 52 195 275
5 DEL MAR PHARM 2012 99 195 275
6 DEL MAR PHARM 2012 20 195 275
7 DEL MAR PHARM 2013 3 16 291
8 DEL MAR PHARM 2013 4 16 291
9 DEL MAR PHARM 2013 9 16 291
10 DEL MAR PHARM 2014 7 14 305
11 DEL MAR PHARM 2014 7 14 305
12 DEL MAR PHARM 2015 7 21 246
13 DEL MAR PHARM 2015 5 21 246
14 DEL MAR PHARM 2015 4 21 246
15 DEL MAR PHARM 2015 5 21 246
16 DEL MAR PHARM 2017 10 10 61我尝试过以下代码,但它不起作用:
library(data.table)
setDT(DF)
DF[, `:=` (Sum_Five_Year= sum(N_C)), by= list(company,cut(year, breaks = c(5), right = F))]如有任何建议,将不胜感激:)
发布于 2019-01-25 13:48:20
没有额外的包,您可以使用sapply。
下面的代码假设已经创建了Sum_Year。您可以将以下内容直接应用于您的示例:
distinct(DF, company, year, Sum_Year) %>%
group_by(company) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
) %>%
left_join(DF %>% select(-Sum_Year), by = c("company", "year"))输出:
# A tibble: 16 x 5
# Groups: company [?]
company year Sum_Year Sum_Five_Year N_C
<chr> <int> <int> <int> <int>
1 DELMARPHARM 2010 80 80 43
2 DELMARPHARM 2010 80 80 37
3 DELMARPHARM 2012 195 275 24
4 DELMARPHARM 2012 195 275 52
5 DELMARPHARM 2012 195 275 99
6 DELMARPHARM 2012 195 275 20
7 DELMARPHARM 2013 16 291 3
8 DELMARPHARM 2013 16 291 4
9 DELMARPHARM 2013 16 291 9
10 DELMARPHARM 2014 14 305 7
11 DELMARPHARM 2014 14 305 7
12 DELMARPHARM 2015 21 246 7
13 DELMARPHARM 2015 21 246 5
14 DELMARPHARM 2015 21 246 4
15 DELMARPHARM 2015 21 246 5
16 DELMARPHARM 2017 10 61 10否则,你可以:
DF %>%
group_by(company, year) %>%
mutate(N_C = as.numeric(as.character(N_C))) %>%
summarise(Sum_Year = sum(N_C)) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
) %>%
left_join(DF %>% select(-Sum_Year), by = c("company", "year"))如果您想摆脱重复格式,只需在结尾处省略连接:
DF %>%
group_by(company, year) %>%
mutate(N_C = as.numeric(as.character(N_C))) %>%
summarise(Sum_Year = sum(N_C)) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
)输出:
# A tibble: 6 x 4
# Groups: company [1]
company year Sum_Year Sum_Five_Year
<chr> <int> <dbl> <dbl>
1 DELMARPHARM 2010 80 80
2 DELMARPHARM 2012 195 275
3 DELMARPHARM 2013 16 291
4 DELMARPHARM 2014 14 305
5 DELMARPHARM 2015 21 246
6 DELMARPHARM 2017 10 61https://stackoverflow.com/questions/54366228
复制相似问题