我经历了几个类似的标题问题,我相信我的情况是不同的。当然,在重新启动Rstudio服务器之前,我已经停止了我的Rstudio服务器,卸载了data.table,然后从源代码重新安装了它。
我有一个data.table,看起来像:
wind<- structure(list(pricedate = structure(c(1538629200, 1538629200,
1538629200, 1538629200, 1538629200), class = c("POSIXct", "POSIXt"
), tzone = "America/Chicago"), hour = c(1L, 1L, 1L, 1L, 1L),
type = c("cop_hsl", "stwpf", "wgrpp", "cop_hsl", "stwpf"),
zone = c("coastal", "coastal", "coastal", "north", "north"
), as_of = structure(c(1538199804, 1538199804, 1538199804,
1538199804, 1538199804), class = c("POSIXct", "POSIXt"), tzone = "America/Chicago"),
wind = c(712, 751.5, 548.2, 843, 846), age = c("4day", "4day",
"4day", "4day", "4day"), daysold = c(4L, 4L, 4L, 4L, 4L)), row.names = c(NA,
-5L), class = c("data.table", "data.frame"))完整的表大约有20M行,占用1.1GB的内存,正如tables()所报告的
以下命令起作用:
windindx<-wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]
wind[windindx]将这些合并为:
wind[wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]]Error: could not find function "."的结果
如果我对data.table进行子集,那么它可以工作,如下所示:
windsm<-wind[round(runif(10000000,0,20676204))]
windsm[windsm[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]]这是我的sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] yaml_2.2.1 R.utils_2.10.1 R.oo_1.24.0 R.methodsS3_1.8.1 nanotime_0.3.3 xts_0.12.1 zoo_1.8-9 bit64_4.0.5 bit_4.0.4
[10] glue_1.4.2 magrittr_2.0.1 future_1.21.0 lubridate_1.7.10 data.table_1.14.0 ggplot2_3.3.5 DALEX_2.3.0 mlr3tuning_0.8.0 paradox_0.7.1
[19] mlr3viz_0.5.5 mlr3learners_0.4.5 mlr3_0.12.0 RPostgres_1.3.3
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 xfun_0.25 purrr_0.3.4 listenv_0.8.0 lattice_0.20-44 colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0
[9] htmltools_0.5.1.1 bbotk_0.3.2 utf8_1.2.2 blob_1.2.2 rlang_0.4.11 pillar_1.6.2 withr_2.4.2 DBI_1.1.1
[17] palmerpenguins_0.1.0 uuid_0.1-4 lifecycle_1.0.0 munsell_0.5.0 gtable_0.3.0 codetools_0.2-18 evaluate_0.14 knitr_1.33
[25] parallel_4.1.1 fansi_0.5.0 Rcpp_1.0.7 scales_1.1.1 backports_1.2.1 checkmate_2.0.0 RcppCCTZ_0.2.9 parallelly_1.27.0
[33] hms_1.1.0 digest_0.6.27 dplyr_1.0.7 grid_4.1.1 tools_4.1.1 tibble_3.1.3 mlr3misc_0.9.3 crayon_1.4.1
[41] pkgconfig_2.0.3 ellipsis_0.3.2 rmarkdown_2.10 lgr_0.4.2 R6_2.5.0 globals_0.14.0 compiler_4.1.1 是否有(相对)大的data.tables阻止它工作的东西?我使用的机器是云中64‘m的VM。htop只报告了大约3.5GB使用的内存,所以仍然有大约60 of的空闲内存。我周围的工作并不太繁重,所以我对答案比任何事情都更好奇。
编辑:对于赏金,我想知道为什么只需要eval有时。
发布于 2021-08-17 16:05:44
您可以使用eval在基本环境中强制计算i参数:
wind[eval(wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1])]
pricedate hour type zone as_of wind age daysold
1: 2018-10-04 1 cop_hsl coastal 2018-09-29 00:43:24 712.0 4day 4
2: 2018-10-04 1 stwpf coastal 2018-09-29 00:43:24 751.5 4day 4
3: 2018-10-04 1 wgrpp coastal 2018-09-29 00:43:24 548.2 4day 4
4: 2018-10-04 1 cop_hsl north 2018-09-29 00:43:24 843.0 4day 4
5: 2018-10-04 1 stwpf north 2018-09-29 00:43:24 846.0 4day 4help('data.table)‘告诉您:
高级:当我是一个单变量名时,它不是列名的表达式,而是在调用范围中进行计算。
这就是为什么第一个使用i作为单变量windindx的解决方案是有效的,但是在错误的范围内计算的组合却不能工作。
发布于 2021-10-15 02:20:15
详细说明@Waldi的答复:
> wind[browser()]
Called from: eval(.massagei(isub), x, ienv)
Browse[1]> wind
[1] 712.0 751.5 548.2 843.0 846.0
Browse[1]> wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]
Error in .(pricedate, hour) : could not find function "."
Browse[2]> ls()
[1] "age" "as_of" "daysold" "hour" "pricedate" "type" "wind" "zone" 对于语法x[i],这是我们在i不是单个符号时所操作的环境。此环境包括x列,在查找符号时首先查找其中的列。
如果我们传递一个符号,比如wind[wind, on=.(hour)],那么它在父环境中进行查找,而不需要在x环境的列中计算i。
我认为@Waldi的答案中引用的文档足以告诉我们如何避免这个问题,但值得注意的似乎是代码的相关部分:
else if (!is.name(isub)) {
ienv = new.env(parent=parent.frame())
if (getOption("datatable.optimize")>=1L) assign("order", forder, ienv)
i = tryCatch(eval(.massagei(isub), x, ienv), error=function(e) {
if (grepl(":=.*defined for use in j.*only", e$message))
stopf("Operator := detected in i, the first argument inside DT[...], but is only valid in the second argument, j. Most often, this happens when forgetting the first comma (e.g. DT[newvar := 5] instead of DT[ , new_var := 5]). Please double-check the syntax. Run traceback(), and debugger() to get a line number.")
else
.checkTypos(e, names_x)
})
} else {
# isub is a single symbol name such as B in DT[B]
i = try(eval(isub, parent.frame(), parent.frame()), silent=TRUE)
...
} https://stackoverflow.com/questions/68819596
复制相似问题