首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在变量中对日期范围内的每一行求和

如何在变量中对日期范围内的每一行求和
EN

Stack Overflow用户
提问于 2020-04-29 11:51:10
回答 3查看 47关注 0票数 0

df1:

代码语言:javascript
复制
library(tidyverse)
library(lubridate)
ex1 <- tibble(date = seq.Date(from = ymd('20200101'), length.out = 100, by = 'day'),
          a = rnorm(100, mean = 1, sd = 2),
          b = runif(100, min = 1, max = 2),
          c = rnorm(100, mean = 3, sd = 1),
          d = runif(100, min = 50, max = 60))

df2:

代码语言:javascript
复制
cal_c <- tibble(variable = c('a', 'b', 'c','d'),
                    start = c(ymd('20200101', '20200103', '20200203', '20200103')),
                    end = c(ymd('20200204', '20200405', '20200301', '20200401')),
                    total = c('NA', 'NA', 'NA', 'NA'))

我想根据df1计算开始和结束日期范围内df2中的每一行,比如a$toal介于'2020-1-1‘到'2020-2-4’之间,b$toal介于'2020-1-3‘到'2020-4-5’之间,任何帮助,非常感谢。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-04-29 11:57:31

我们可以为cal_c数据创建一系列startend日期,获取长格式的ex1并进行连接。然后,我们可以对每个variable执行sum value

代码语言:javascript
复制
library(tidyverse)

cal_c %>%
  mutate(date = map2(start, end, seq, by = 'day')) %>%
  unnest(date) %>%
  left_join(ex1 %>% pivot_longer(cols = -date, names_to = 'variable'),
                    by = c('variable', 'date')) %>%
   group_by(variable, start, end) %>%
   summarise(value = sum(value, na.rm = TRUE))

#  variable start      end         value
#  <chr>    <date>     <date>      <dbl>
#1 a        2020-01-01 2020-02-04   34.3
#2 b        2020-01-03 2020-04-05  136. 
#3 c        2020-02-03 2020-03-01   79.5
#4 d        2020-01-03 2020-04-01 4909. 
票数 1
EN

Stack Overflow用户

发布于 2020-04-29 12:21:12

Base R解决方案:

代码语言:javascript
复制
cal_c$total <- sapply(split(cal_c, rownames(cal_c)), function(x){
  sum(ex1[((ex1$date  >= x$start) & (ex1$date <= x$end)), match(x$variable, names(ex1))])})
票数 0
EN

Stack Overflow用户

发布于 2020-04-29 12:22:56

使用data.table的选项

代码语言:javascript
复制
cal_c[, total :=
    ex1[cal_c, on=.(date>=start, date<=end), by=.EACHI,
        sum(.SD[[variable]])]$V1
    ]

输出:

代码语言:javascript
复制
   variable      start        end      total
1:        a 2020-01-01 2020-02-04   34.04780
2:        b 2020-01-03 2020-04-05  135.40290
3:        c 2020-02-03 2020-03-01   91.10271
4:        d 2020-01-03 2020-04-01 4978.59884

数据:

代码语言:javascript
复制
set.seed(0L)
library(data.table)
ex1 <- data.table(date = seq.Date(from = as.IDate('20200101', format="%Y%m%d"), length.out = 100, by = 'day'),
    a = rnorm(100, mean = 1, sd = 2),
    b = runif(100, min = 1, max = 2),
    c = rnorm(100, mean = 3, sd = 1),
    d = runif(100, min = 50, max = 60))

cal_c <- data.table(variable = c('a', 'b', 'c','d'),
    start = as.IDate(c('20200101', '20200103', '20200203', '20200103'), format="%Y%m%d"),
    end = as.IDate(c('20200204', '20200405', '20200301', '20200401'), format="%Y%m%d"))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61493921

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档