首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >全外接双面辊(LOCF)

全外接双面辊(LOCF)
EN

Stack Overflow用户
提问于 2013-11-25 17:27:49
回答 4查看 667关注 0票数 3

如何有效地将两个data.table与完全外部连接合并,同时在左侧和右侧滚动上的最后观察值(LOCF)来处理缺失的值?

真实世界应用程序-有两个不一定交叉交易规则信号表,XY,持有(稀疏)信号值随着时间的推移。总体目标是定义复合信号,其中Signal.z = Signal.x和Signal.y

代码语言:javascript
复制
X <- data.table(Instrument=rep("SPX",3)
                , Date=as.IDate(c("2013-11-20","2013-11-22","2013-11-24"))
                , Signal=c(TRUE,FALSE,TRUE), key=c("Instrument", "Date"))

Y <- data.table(Instrument=rep("SPX",3)
                , Date=as.IDate(c("2013-11-21","2013-11-23","2013-11-25"))
                , Signal=c(FALSE,TRUE,FALSE), key=c("Instrument", "Date"))

期望的结果

代码语言:javascript
复制
   Instrument       Date Signal.x Signal.y Signal.z
1:        SPX 2013-11-20     TRUE       NA       NA
2:        SPX 2013-11-21     TRUE    FALSE    FALSE
3:        SPX 2013-11-22    FALSE    FALSE    FALSE
4:        SPX 2013-11-23    FALSE     TRUE    FALSE
5:        SPX 2013-11-24     TRUE     TRUE     TRUE
6:        SPX 2013-11-25     TRUE    FALSE    FALSE
EN

回答 4

Stack Overflow用户

回答已采纳

发布于 2013-11-25 17:54:02

也许是这样的:

代码语言:javascript
复制
dates = sort(c(X$Date, Y$Date))

setkey(X, Date)
setkey(Y, Date)

Z = X[J(dates), roll = T][,
      Signal.y := Y[J(dates), roll = T]$Signal][,
      Signal.z := as.logical(Signal * Signal.y)]

在这个想法的基础上,下面是一种针对大型示例数据的方法:

代码语言:javascript
复制
# assuming keys are set to Instrument, Date in both data.tables

Z = unique(setkey(rbind(setnames(X[Y, roll = T],
                                 c("Instrument", "Date", "Signal.x", "Signal.y")),
                        setnames(Y[X, roll = T],
                                 c("Instrument", "Date", "Signal.y", "Signal.x")),
                        use.names = TRUE),
                  Instrument, Date))[,
           Signal.z := as.logical(Signal.x * Signal.y)]
票数 5
EN

Stack Overflow用户

发布于 2013-11-25 20:42:39

Linked here is an excellent answer来自mnel,解释了如何在data.table包中完成一个完全的外部连接。

这里的应用程序很简单,增加了向前滚动最后一个观察(通过连接中的roll = TRUE )所带来的麻烦。

创建一个data.table,保存XY中的所有(唯一)键。

代码语言:javascript
复制
## one way to do the outer join
keys <- unique(rbind(X[,key(X),with = FALSE], Y[,key(Y), with = FALSE]))
## alternate way if you have multiple data.tables to outer join
keys <- lapply(list(X,Y), function(z) z[,key(z), with = FALSE])
keys <- rbindlist(keys)

## this setkey is mostly cosmetic - 
## determines whether the final output is sorted or not
setkeyv(keys, names(keys))

##cosmetic changing of column names to minimize confusion
setnames(X,"Signal","Signal.X")
setnames(Y,"Signal","Signal.Y")

## two joins, followed by the definition of the new column
X[Y[keys, roll = TRUE], roll = TRUE][,
    Signal.Z := as.logical(Signal.X * Signal.Y)]
## this output is returned invisibly. either assign it or force print
.Last.value
#    Instrument       Date Signal.X Signal.Y Signal.Z
# 1:        SPX 2013-11-20     TRUE       NA       NA
# 2:        SPX 2013-11-21     TRUE    FALSE    FALSE
# 3:        SPX 2013-11-22    FALSE    FALSE    FALSE
# 4:        SPX 2013-11-23    FALSE     TRUE    FALSE
# 5:        SPX 2013-11-24     TRUE     TRUE     TRUE
# 6:        SPX 2013-11-25     TRUE    FALSE    FALSE

习语as.logical(. * .)复制NA传播的&是受到Eddi's answer的启发。

票数 3
EN

Stack Overflow用户

发布于 2013-11-26 15:52:08

我将测量三种可用解决方案(Daniel.Krizian、Blue.Magister、eddi)的时间。

为此,我创建了更大的基准测试数据--大型信号表XY

基准数据:XY

代码语言:javascript
复制
nobs <- 5000 # number of observations for each instrument
nopps <- nobs * 3 # opportunities to trade in the time window studied
ninstr <- 200 # number of instruments

set.seed(2)  # set.seed(1) generates "MPM" instrument twice :)
universe <-  replicate( ninstr , paste( sample( LETTERS , 3 , repl = TRUE ), collapse = "" ) )
window <- as.Date("2013-11-26") - 1:nopps + 1
frame <- CJ(Instrument=universe, Date=rep(1:nobs))

gen.sig.tbl <- function() {
  frame[, Date:= as.IDate(sample(window, size=nobs, replace=F)), by="Instrument"]
  setkey(frame,Instrument,Date)

  rnd.sig.sparse <- function(nobs) {
    frst <- sample(c(FALSE,TRUE), 1)
    rep(c(frst,!frst), nobs/2)
  }

  frame[, Signal:=rnd.sig.sparse(nobs), by="Instrument"]
  return(copy(frame))
}
set.seed(1)
X <- gen.sig.tbl()
set.seed(2)
Y <- gen.sig.tbl()

X
             Instrument       Date Signal
      1:        AAS 1972-11-02  FALSE
      2:        AAS 1972-11-04   TRUE
      3:        AAS 1972-11-07  FALSE
      4:        AAS 1972-11-08   TRUE
      5:        AAS 1972-11-10  FALSE
     ---                             
 999996:        ZVH 2013-11-14  FALSE
 999997:        ZVH 2013-11-15   TRUE
 999998:        ZVH 2013-11-18  FALSE
 999999:        ZVH 2013-11-25   TRUE
1000000:        ZVH 2013-11-26  FALSE

Y
         Instrument       Date Signal
      1:        AAS 1972-11-13   TRUE
      2:        AAS 1972-11-17  FALSE
      3:        AAS 1972-11-20   TRUE
      4:        AAS 1972-11-21  FALSE
      5:        AAS 1972-11-23   TRUE
     ---                             
 999996:        ZVH 2013-11-16   TRUE
 999997:        ZVH 2013-11-19  FALSE
 999998:        ZVH 2013-11-23   TRUE
 999999:        ZVH 2013-11-24  FALSE
1000000:        ZVH 2013-11-25   TRUE

三种解决办法:

代码语言:javascript
复制
Daniel.Krizian <- function () {
  Z <- merge(X, Y, all=TRUE)[, c("Signal.x","Signal.y"):=list( na.locf(Signal.x, na.rm = F)
                                                               , na.locf(Signal.y, na.rm = F))
                             , by=Instrument]

  Z[, Signal.z := Signal.x & Signal.y]

  # and the last line because (FALSE & NA) == FALSE, whereas NA result is desired
  Z[, Signal.z := ifelse(is.na(Signal.x) | is.na(Signal.y), NA, Signal.z)]
  return(Z)
}



Blue.Magister <- function() {
  keys <- unique(rbind(X[,key(X),with = FALSE], Y[,key(Y), with = FALSE]))

  ## this setkey is mostly cosmetic - 
  ## determines whether the final output is sorted or not
  setkeyv(keys, names(keys))

  ##cosmetic changing of column names to minimize confusion
  setnames(X,"Signal","Signal.X")
  setnames(Y,"Signal","Signal.Y")

  ## two joins, followed by the definition of the new column
  Z <- X[Y[keys, roll = TRUE], roll = TRUE][,
                                       Signal.Z := as.logical(Signal.X * Signal.Y)]
  Z <- unique(Z)
  return(Z)
}

eddi <- function (){

  # assuming keys are set to Instrument, Date in both data.tables
  Z = unique(setkey(rbind(setnames(X[Y, roll = T],
                                   c("Instrument", "Date", "Signal.x", "Signal.y")),
                          setnames(Y[X, roll = T],
                                   c("Instrument", "Date", "Signal.y", "Signal.x")),
                          use.names = TRUE),
                    Instrument, Date))[,
                                       Signal.z := as.logical(Signal.x * Signal.y)]
  return(Z)
}

基准:

代码语言:javascript
复制
system.time(Z.DK <- Daniel.Krizian())

user  system elapsed 
2.70    0.07    3.01 

system.time(Z.eddi <- eddi())

user  system elapsed 
1.14    0.03    1.84 

system.time(Z.BM <- Blue.Magister())

user  system elapsed 
3.35    0.14    3.52

setnames(X,"Signal.X", "Signal") # reset original data back after Blue.Magister() call
setnames(Y,"Signal.Y", "Signal") # reset original data back after Blue.Magister() call
setnames(Z.BM
         , c("Signal.X", "Signal.Y", "Signal.Z")
         , c("Signal.x", "Signal.y", "Signal.z"))
identical(Z.DK, Z.BM)

TRUE

identical(Z.DK, Z.eddi)

TRUE
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/20199374

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档