我有一个包含多个VC、基金、业绩、年份和基金规模的数据集。每只基金都有自己的表现。为了测试是否存在业绩持续性,我希望找到上一只/前一只基金(“PreviousPerformance”)的业绩。PreviousPerformance是同一VC的前一只基金的业绩。因此,年份决定了哪些基金是第一个启动的,第二个启动的,等等。
VC FUND Performance Year FundSize
A Partners A 0.30 2005 1
B Capital B5 0.20 2008 2
B Capital B4 0.10 2003 3
B Capital B3 0.25 2001 4
B Capital B2 0.20 2001 5
B Capital B1 0.10 2000 6 例如:基金B5的PreviousPerformance为0.1,这是B4的表现。
有时,不清楚哪只基金是最后一只基金。例如,B4之前有两只基金,B2和B3,这两只基金都是在2001年成立的。在这种情况下,我希望PreviousPerformance是FundSize最大的基金的表现(对于B4,这是B5)。如果基金没有前置基金,则PreviousPerformance = "-“
最后,数据集应该如下所示。
VC FUND Performance Year FundSize PreviousPerformance
A Partners A 0.30 2005 1 -
B Capital B5 0.20 2008 2 0.1
B Capital B4 0.10 2003 3 0.2
B Capital B3 0.25 2001 4 0.1
B Capital B2 0.20 2001 5 0.1
B Capital B1 0.10 2000 6 -我不知道该用什么函数来解决这类问题,有没有人有建议?
使用的代码例如:
FundPerformance = data.table(VC = c("A Partners", rep("B Capital",5)),
FUND = c("A","B5","B4","B3","B2","B1"),
Performance = c(0.3,0.2,0.1,0.25,0.2,0.1),
Year= c(2005,2008,2003,2001,2001,2000),
FundSize=c(1:6),
PreviousPerformance = c ("-",0.1,0.2,0.1,0.1,"-"))发布于 2017-05-27 03:14:43
我会做以下事情。但我不是data.table的常客,所以其他人可能有更干净的解决方案。
# set up data
require(data.table)
dt <- data.table(VC = c("A Partners", rep("B Capital",5)),
Fund = c("A","B5","B4","B3","B2","B1"),
Performance = c(0.3,0.2,0.1,0.25,0.2,0.1),
Year= c(2005,2008,2003,2001,2001,2000),
FundSize=c(1:6),
PreviousPerformance = c (NA,0.1,0.2,0.1,0.1,NA))
setkey(dt, VC, Year)
# find performance for largest fund within each VC each year
dt[, PreviousPerformanceII:=Performance[FundSize==max(FundSize)], keyby=key(dt)]
dtUnique <- unique(dt, keyby=key(dt))
dtUnique <- dtUnique[, list(VC, Year, PreviousPerformanceII)]
# rolling join to pick up last year's performance
dtUnique[, Year:=Year+1]
setkey(dtUnique, VC, Year)
dtNew <- dtUnique[dt, roll=TRUE]
# clean up data
dtNew$i.PreviousPerformanceII <- NULL
setkey(dtNew, VC, Year)
dtNewhttps://stackoverflow.com/questions/44202848
复制相似问题