问以两列之间的最小距离合并/连接两个数据集
EN

Stack Overflow用户

提问于 2019-03-04 21:38:49

回答 1查看 145关注 0票数 1

我正在尝试合并两个收益率数据集，我需要根据最小的到期日差异来合并它们。因为我想计算商业贷款和相应期限的国库券之间的利差。join可以工作，但我正在寻找一种更好的方法，也许使用fuzzy_join？

library(data.table)
library(zoo)
library(tidyverse)

# Commercial loan issued in 2002 Q1 with a maturity of 119 months
dt.MWE <- structure(list(Issue.Year.Quarter = structure(2002, class = "yearqtr"), Maturity.Date = structure(15385, class = "Date")
               , Issue.Months.to.Maturity = 119), row.names = c(NA,  -1L), class = c("data.table", "data.frame"))


# Treasury Yields in 2002 Q1 with different maturities
dt.Yields <- structure(list(Year.Quarter = structure(c(2002, 2002, 2002, 2002,  2002, 2002, 2002, 2002, 2002, 2002, 2002), class = "yearqtr"), 
                            Maturity = c(12, 120, 1, 24, 240, 36, 360, 3, 60, 6, 84), 
                            Avg.Treasury.Yield = c(2.32000001271566, 5.0766666730245, 1.73666663964589, 3.20333329836527, 5.74333333969116, 3.74999992052714, 
                                                   5.42499995231629, 1.75666666030884, 4.46000003814697, 1.89666664600372,  4.8799999554952))
                       , row.names = c(NA, -11L), class = c("data.table", "data.frame"))


dt.join.result <- dt.MWE %>% inner_join(x = . , y = dt.Yields
                  , by = c(Issue.Year.Quarter = "Year.Quarter")) %>% mutate(.data = ., Dist.Maturity = abs(Issue.Months.to.Maturity - Maturity))  %>% group_by(.data = .,Issue.Year.Quarter )%>% mutate(.data = ., rank.Dist.Maturity = row_number(Dist.Maturity)) %>% dplyr::filter(rank.Dist.Maturity == 1) %>% data.table(.)

>   Issue.Year.Quarter Maturity.Date Issue.Months.to.Maturity Maturity Avg.Treasury.Yield Dist.Maturity min.Dist.Maturity
1:            2002 Q1    2012-02-15                      119      120           5.076667             1                 1

dplyr

data.table

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-04 22:55:53

使用滚动连接的解决方案

由于某些原因，在处理您的样本数据时，data.table给了我错误，所以我创建了dt1和dt2的副本来处理。这些(可能)在你这边是不需要的……

library(data.table)

#create copies of the data.tables
dt1 <- copy( dt.MWE )
dt2 <- copy( dt.Yields )

#set keys to join on.
#the last key of each dt is using in the roll-action of the join
setkeyv( dt1, c("Issue.Year.Quarter", "Issue.Months.to.Maturity"))
setkeyv( dt2, c("Year.Quarter", "Maturity"))

#perform by reference (=fast!) rolling join to get the nearest match of the last (=second) key
dt1[, c("Maturity", "Avg.Treasury.Yield") := ( dt2[dt1, list( x.Maturity, Avg.Treasury.Yield) , roll = "nearest"])]
#calculate the absolute distance
dt1[, min.Dist.Maturity := abs( Issue.Months.to.Maturity - Maturity) ][]

#    Issue.Year.Quarter Maturity.Date Issue.Months.to.Maturity Maturity Avg.Treasury.Yield min.Dist.Maturity
# 1:               2002    2012-02-15                      119      120           5.076667                 1

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54984478

复制

相似问题

问以两列之间的最小距离合并/连接两个数据集
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问以两列之间的最小距离合并/连接两个数据集EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问以两列之间的最小距离合并/连接两个数据集
EN