我想在某一特定日期前计算出与公司名单有关的投资。我有公司名单和投资日期。
这是我的资料-
参考日期
d1 <- as.Date(paste0("201001","01"), "%Y%m%d")
d2 <- as.Date(paste0("201201","01"), "%Y%m%d")
dat <- seq(d1,d2,by="month")投资数据
> head(df)
company_name funding_round_type funding_round_code funded_at raised_amount_usd yearMonth
1 0-6.com venture A 2008-03-19 2000000 2008-03-01
2 004 Technologies venture 2014-07-24 NA 2014-03-01
3 01Games Technology undisclosed 2014-07-01 41250 2014-03-01
4 H2O.ai venture B 2015-11-09 20000000 2015-03-01
5 H2O.ai seed 2013-05-22 3000000 2013-03-01
6 H2O.ai venture 2013-01-03 1700000 2013-03-01我想计算一下,每家公司在
dat中的每一个日期都筹集了多少资金。
result <- merge(dat, df$company_name) %>%
mutate(asOf = x,
companyName = as.character(y)) %>% select(-x, -y) %>%
mutate(raised = sum(df[df$company_name == companyName &
df$yearMonth < asOf,c("raised_amount_usd")])) 不幸的是,过滤器不起作用。如果我把它设置到一个特定的公司,它就能工作。这是可行的,例如:sum(df[df$company_name == companyName & df$yearMonth < asOf,c("raised_amount_usd")])。
我想得到一个这样的结果-
asOf companyName cumulative_raised
1 2010-01-01 0-6.com 0
2 2010-02-01 0-6.com 12000000
3 2010-03-01 0-6.com 12000000
4 2010-01-01 H2O.ai 0
5 2010-02-01 H2O.ai 5000000
6 2010-03-01 H2O.ai 9300000当过滤器在一个可变子句中时,如何使它工作?
发布于 2019-03-16 09:35:50
解决这一问题的一种方法是使用complete (来自tidyr)与group_by、mutate和summarize (来自dplyr)以及cumsum而不是sum (BaseR)。由于您提供的数据与您想要的间隔几乎没有重叠,所以我对间隔做了一些修改,以说明它是如何工作的。当然,这是完全灵活的,您可以使用任何您想要的间隔:
library(dplyr)
library(tidyr)
my.dat <- seq(as.Date("2013-03-01"), as.Date("2014-04-01"), by = "month")
new.df <- my.df %>%
complete(company_name, yearMonth = my.dat, fill = list(raised_amount_usd = 0)) %>%
group_by(company_name, yearMonth) %>%
summarize(raised_amount_usd = sum(raised_amount_usd, na.rm = TRUE)) %>%
arrange(yearMonth) %>%
mutate(cumulative_raised = cumsum(raised_amount_usd)) %>%
select(company_name, yearMonth, cumulative_raised)
tail(new.df, 10)
# A tibble: 10 x 3
# Groups: company_name [4]
company_name yearMonth cumulative_raised
<chr> <date> <dbl>
1 01Games Technology 2014-02-01 0
2 H2O.ai 2014-02-01 4700000
3 0-6.com 2014-03-01 0
4 004 Technologies 2014-03-01 0
5 01Games Technology 2014-03-01 41250
6 H2O.ai 2014-03-01 4700000
7 0-6.com 2014-04-01 0
8 004 Technologies 2014-04-01 0
9 01Games Technology 2014-04-01 41250
10 H2O.ai 2014-04-01 4700000它是如何工作的?
首先,使用complete,我们在yearMonth列中填充缺少的日期,并排除那些不在指定时间范围内的日期。然后,我们与group_by组成company_name和yearMonth组,然后对每个日期和公司组成summarize raised_amount_usd (与H2O.ai在2013-03-01年的募集金额相加)。然后利用yearMonth对数据进行整理,计算累积和。数据仍按company_name分组,从而计算出每个公司的累积和。最后,我们只选择您感兴趣的列。
数据
my.df <-
structure(list(company_name = c("0-6.com", "004 Technologies", "01Games Technology", "H2O.ai", "H2O.ai", "H2O.ai"),
funding_round_type = c("venture", "venture", "undisclosed", "venture", "seed", "venture"),
funding_round_code = c("A", " ", " ", "B", " ", " "),
funded_at = structure(c(13957, 16275, 16252, 16748, 15847, 15708), class = "Date"),
raised_amount_usd = c(2000000L, NA, 41250L, 20000000L, 3000000L, 1700000L),
yearMonth = structure(c(13939, 16130, 16130, 16495, 15765, 15765), class = "Date")),
class = "data.frame", row.names = c(NA, -6L))https://stackoverflow.com/questions/55194168
复制相似问题