我有:
DT = data.table(ID=rep(1:2,each = 2), Index=rep(1:2,times = 2), Close=3:6, Open=7:10)我的算法先前确定了DT保存列中具有名称Index的时间信息,因此该算法存储以下映射:
time.col <- "Index"现在,该算法希望执行一个相当于以下内容的计算:
DT[, list(Index, Value=cumsum(Close)),by=ID]
ID Index Value
1: 1 1 3
2: 1 2 7
3: 2 1 5
4: 2 2 11如何重写行并插入time.col变量?
以下两项都不起作用:
DT[, list(time.col, Value=cumsum(Close)),by=ID]
DT[, list(substitute(time.col), Value=cumsum(Close)),by=ID]发布于 2014-05-05 03:25:19
您可以在j中为所有DT创建表达式。
e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))"))
DT[, eval(e),by=ID]编辑
或者,如果将“索引”存储为名称,则可以在time.col环境中计算.SD
time.col <- as.name("Index")
DT[,list(eval(time.col,envir=.SD), Value=cumsum(Close)),by=ID]非常类似的问题:In R data.table, how do I pass variable parameters to an expression?
此外,这个问题有助于理解data.table:eval and quote in data.table中的非标准评估的奥秘。
发布于 2014-05-05 21:56:07
原来,上面提到的evals中最快的解决方案是
e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))")) DT[, eval(e),by=ID]
然而,:=解决方案甚至更快。另见Arun关于复制的说明。
数据集
dim(DT); object.size(DT); DT
[1] 1354402 8
81291568 bytes
Instrument Date Open High Low Close Volume Adjusted Close
1: GOOG/AMEX_ABI 1981-03-11 NA NA 6.56 6.75 217200 NA
2: GOOG/AMEX_ABI 1981-03-12 NA NA 6.66 6.88 616400 NA
3: GOOG/AMEX_ABI 1981-03-13 NA NA 6.81 6.84 462000 NA
4: GOOG/AMEX_ABI 1981-03-16 NA NA 6.81 7.00 306400 NA
5: GOOG/AMEX_ABI 1981-03-17 NA NA 6.88 6.88 925600 NA
---
1354398: YAHOO/TSX_AMM_TO 2014-04-24 1.56 1.58 1.56 1.58 2700 1.58
1354399: YAHOO/TSX_AMM_TO 2014-04-25 1.60 1.62 1.59 1.62 11000 1.62
1354400: YAHOO/TSX_AMM_TO 2014-04-28 1.59 1.61 1.54 1.54 7200 1.54
1354401: YAHOO/TSX_AMM_TO 2014-04-29 1.58 1.60 1.58 1.59 500 1.59
1354402: YAHOO/TSX_AMM_TO 2014-04-30 1.55 1.55 1.50 1.52 36800 1.52标杆
time.col <- "Date"
fun <- function(){
out <- DT[, list(get(time.col), Value=cumsum(Close)),by=Instrument]
setnames(out, "V1", time.col)
}
fun2 <- function() {
DT[, Value := cumsum(Close), by=Instrument]
out <- DT[ , c("Instrument", ..time.col, "Value")]
DT[, Value:=NULL] # cleanup
out
}
fun2. <- function() {
DT[, Value := cumsum(Close), by=Instrument]
# out <- DT[,c("Instrument", ..time.col, "Value")]
# DT[, Value:=NULL] # cleanup
# out
}
fun3 <- function() {
DT[,list( eval(as.name(time.col),envir=.SD), Value=cumsum(Close)),by=Instrument]
}
fun4 <- function() {
e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))"))
DT[, eval(e),by=Instrument]
}结果
library(rbenchmark)
benchmark(fun(),
fun2(),
fun3(),
fun4(),
replications=200)
test replications elapsed relative user.self sys.self user.child sys.child
1 fun() 200 5.40 1.327 5.29 0.11 NA NA
2 fun2() 200 5.18 1.273 4.72 0.45 NA NA
3 fun2.() 200 2.70 1.000 2.70 0.00 NA NA
3 fun3() 200 4.12 1.012 3.90 0.22 NA NA
4 fun4() 200 4.07 1.000 3.91 0.16 NA NAhttps://stackoverflow.com/questions/23463270
复制相似问题