我需要合并两个数据集,但如果一个数据集的日期介于另一个数据集的两个日期之间,则行必须合并。第一个数据集data如下所示:
Date Weight diff Loc.nr
2013-01-24 1040 7 2
2013-01-31 1000 7 2
2013-02-07 1185 7 2
2013-02-14 915 7 2
2013-02-21 1090 7 2
2013-03-01 1065 9 2
2013-01-19 500 4 9
2013-01-23 1040 3 9
2013-01-28 415 5 9
2013-01-31 650 3 9
2013-02-04 725 4 9
2013-02-07 450 3 9
2013-02-11 550 4 9其他数据集matches如下所示:
Date winning
2013-01-20 1
2013-01-27 0
2013-02-03 1
2013-02-10 0
2013-02-17 1
2013-02-24 0我编写了一个代码,将获奖列从matches连接到数据集" data ":
data$winning <- NA
for(i in 1:nrow(data)) {
for(j in 1:nrow(matches)) {
if((data$Date[i]-data$diff[i]) < matches$Date[j] & data$Date[i] > matches$Date[j]) {
data$winning[i] <- matches$winning[j]
}
}
}这段代码需要3天的时间才能运行,有更快的方法吗?
我的预期产出是:
Date Weight diff Loc.nr winning
2013-01-24 1040 7 2 1
2013-01-31 1000 7 2 0
2013-02-07 1185 7 2 1
2013-02-14 915 7 2 0
2013-02-21 1090 7 2 1
2013-03-01 1065 9 2 0
2013-01-19 500 4 9 NA
2013-01-23 1040 3 9 NA
2013-01-28 415 5 9 0
2013-01-31 650 3 9 NA
2013-02-04 725 4 9 1
2013-02-07 450 3 9 NA
2013-02-11 550 4 9 0发布于 2017-11-10 15:29:23
使用Gregor建议的非equi连接,您可以尝试一些方法。
library(data.table)
setDT(data)[, winning := setDT(matches)[data[, .(upper = Date, lower = Date - diff)],
on = .(Date < upper, Date > lower)]$winning][]Date Weight diff Loc.nr winning 1: 2013-01-24 1040 7 2 1 2: 2013-01-31 1000 7 2 0 3: 2013-02-07 1185 7 2 1 4: 2013-02-14 915 7 2 0 5: 2013-02-21 1090 7 2 1 6: 2013-03-01 1065 9 2 0 7: 2013-01-19 500 4 9 NA 8: 2013-01-23 1040 3 9 NA 9: 2013-01-28 415 5 9 0 10: 2013-01-31 650 3 9 NA 11: 2013-02-04 725 4 9 1 12: 2013-02-07 450 3 9 NA 13: 2013-02-11 550 4 9 0
https://stackoverflow.com/questions/47220223
复制相似问题