我是R的新手,我用glmer来拟合几个二项式模型,我只需要它们来调用predict来使用产生的概率。然而,我有一个非常大的数据集,甚至只有一个模型的大小变得非常大:
> library(pryr)
> object_size(mod)
701 MB与之相比,模型系数的大小相形见绌:
> object_size(coef(mod))
1.16 MB与拟合值的大小一样:
> object_size(fitted(mod))
25.6 MB首先,我不明白为什么模型的对象大小这么大。它似乎包含了用于适应模型的原始数据框架,但即便如此,也没有考虑到模型的大小。为什么这么大?
第二,是否有可能将模型分解为只需要调用预测的部分?如果是的话,我该怎么做呢?我找到了一篇文章,这是glm在http://blog.yhathq.com/posts/reducing-your-r-memory-footprint-by-7000x.html上做的,但是似乎glmer模型的访问方式不同,并且有不同的组件。
任何帮助都将不胜感激。
编辑:
深入了解模型的内部结构:
> object_size(getME(mod, "X"))
205 MB
> object_size(getME(mod, "Z"))
36.9 MB
> object_size(getME(mod, "Zt"))
38.4 MB
> object_size(getME(mod, "Ztlist"))
41.6 MB
> object_size(getME(mod, "mmList"))
38.4 MB
> object_size(getME(mod, "y"))
3.2 MB
> object_size(getME(mod, "mu"))
3.2 MB
> object_size(getME(mod, "u"))
18.4 kB
> object_size(getME(mod, "b"))
19.5 kB
> object_size(getME(mod, "Gp"))
56 B
> object_size(getME(mod, "Tp"))
472 B
> object_size(getME(mod, "L"))
15.5 MB
> object_size(getME(mod, "Lambda"))
38.1 kB
> object_size(getME(mod, "Lambdat"))
38.1 kB
> object_size(getME(mod, "Lind"))
9.22 kB
> object_size(getME(mod, "Tlist"))
936 B
> object_size(getME(mod, "A"))
38.4 MB
> object_size(getME(mod, "RX"))
30.3 kB
> object_size(getME(mod, "RZX"))
1.05 MB
> object_size(getME(mod, "sigma"))
48 B
> object_size(getME(mod, "flist"))
4.89 MB
> object_size(getME(mod, "fixef"))
4.5 kB
> object_size(getME(mod, "beta"))
496 B
> object_size(getME(mod, "theta"))
472 B
> object_size(getME(mod, "ST"))
936 B
> object_size(getME(mod, "REML"))
48 B
> object_size(getME(mod, "is_REML"))
48 B
> object_size(getME(mod, "n_rtrms"))
48 B
> object_size(getME(mod, "n_rfacs"))
48 B
> object_size(getME(mod, "N"))
256 B
> object_size(getME(mod, "n"))
256 B
> object_size(getME(mod, "p"))
256 B
> object_size(getME(mod, "q"))
256 B
> object_size(getME(mod, "p_i"))
408 B
> object_size(getME(mod, "l_i"))
408 B
> object_size(getME(mod, "q_i"))
408 B
> object_size(getME(mod, "mod"))
48 B
> object_size(getME(mod, "m_i"))
424 B
> object_size(getME(mod, "m"))
48 B
> object_size(getME(mod, "cnms"))
624 B
> object_size(getME(mod, "devcomp"))
2.21 kB
> object_size(getME(mod, "offset"))
3.2 MB
> get_obj_size(mod@resp, "RC")
[,1]
family 673355488
initialize 673355488
initialize#lmResp 673355488
ptr 673355488
resDev 673355488
updateMu 673355488
updateWts 673355488
wrss 673355488
eta 3196024
mu 3196024
n 3196024
offset 3196024
sqrtrwt 3196024
sqrtXwt 3196024
weights 3196024
wtres 3196024
y 3196024
Ptr 40
> get_obj_size(mod@pp, "RC")
[,1]
beta 449419408
initialize 449419408
initializePtr 449419408
ldL2 449419408
ldRX2 449419408
linPred 449419408
ptr 449419408
setTheta 449419408
sqrL 449419408
u 449419408
X 204549128
V 182171288
Ut 38448168
Zt 38448168
LamtUt 38353248
Xwts 3196024
RZX 1047176
Lambdat 38136
VtV 26192
delu 18408
u0 18408
Utr 18408
Lind 9224
beta0 496
delb 496
Vtr 496
theta 72
Ptr 40发布于 2015-07-11 20:19:28
就目前而言,作为一个不完整的答案:
library("lme4")
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
library("pryr")
object_size(gm1) ## 505 kB以下是Steve的S3/S4/Reference类字典,用于列出和提取字段:
get_obj_size <- function(obj,type="S4") {
fields <- switch(type,
S4=slotNames(obj),
RC=ls(obj))
get_field <- switch(type,
S4=function(x) slot(obj,x),
RC=function(x) obj[[x]])
field_list <- setNames(lapply(fields,get_field),fields)
cbind(sort(sapply(field_list,object_size),decreasing=TRUE))
}
get_obj_size(gm1)
## [,1]
## resp 356620 ## 'response module'
## pp 355420 ## 'predictor module'
## frame 6640
## optinfo 1748
## devcomp 1424
## call 1244
## flist 1232
## cnms 224
## u 152
## beta 56
## Gp 32
## lower 32
## theta 32这将是值得进一步深入的响应和预测模块,以了解什么是存在的/什么是大的,与警告/复杂,一些信息将存储在这些组件的环境中
例如,我认为名义上相同大小的所有组件实际上并不是独立的,而是具有相同的环境.
get_obj_size(gm1@resp,"RC")
## [,1]
## initialize 356620
## initialize#lmResp 356620
## ptr 356620
## resDev 356620
## setOffset 356620
## updateMu 356620
## updateWts 356620
## wrss 356620
## family 26016
## eta 472
## mu 472
## n 472
## offset 472
## sqrtrwt 472
## sqrtXwt 472
## weights 472
## wtres 472
## y 472
## Ptr 20查看存储哪些组件的另一种方法是使用object_size(getME(model,component))并迭代通过eval(formals(getME)$name)列出的组件;这不太精确地对应于内部存储信息的方式,但会让您了解需要容纳多少空间(例如)。固定效应或随机效应模型矩阵。
我做了更多的工作,并且有了部分的解决方案,但是仍然有很多存储的东西我似乎无法找到/正确地修剪掉(注意到,这需要最新版本的lme4 on Github:我不得不稍微修改predict函数,以削弱对内部结构的依赖)。
glmer_chop <- function(object) {
newobj <- object
newobj@frame <- model.frame(object)[0,]
newobj@pp <- with(object@pp,
new("merPredD",
Lambdat=Lambdat,
Lind=Lind,
theta=theta,
u=u,u0=u0,
n=nrow(X),
X=matrix(1,nrow=nrow(X)),
Zt=Zt)) ## .sparseDiagonal(n,shape="g")))
newobj@resp <- new("glmResp",family=binomial(),y=numeric(0))
return(newobj)
}
get_obj_size(environment(fm2@pp$initialize),"RC")
fm1 <- glmer(use ~ urban+age+livch+(1|district), Contraception, binomial)
object_size(Contraception) ## 133 kB
object_size(fm1) ## 1.05 MB
object_size(fm2 <- glmer_chop(fm1)) ## 699 kB
get_obj_size(fm2) ## 'pp' is 547200 bytes
get_obj_size(fm2@pp,"RC") ## 'initialize' object is 547200
saveRDS(fm2,file="tmp.rds")
fm2 <- readRDS("tmp.rds")
object_size(fm2) ## 796 kB
rm(fm1)
pp <- predict(fm2,newdata=Contraception)
object_size(fm2) ## still 796K; no sharing最后请注意,compare_size(fm2)确认这里的大部分信息存储在环境中,而不是对象本身中(但我不知道compare_size/object.size如何处理引用类.)
发布于 2015-07-11 18:05:43
您关心的是存储空间还是RAM?如果是关于存储,一种选择是嵌入调用,以便在生成预测的代码中估计模型,因此您永远不会实际存储模型对象。类似于:
predictions <- predict(glmer(y ~ x, family = binomial), type = "response")发布于 2022-11-25 13:14:11
使用当前的lme版本(1.1-31.1),@BenBolker提出的答案在使用predict()-function时确实给出了错误(不同的)结果。需要包含"delu“对象才能修复这个问题,并具有与原始模型对象相同的预测:
glmer_chop <- function(object) {
newobj <- object
newobj@frame <- model.frame(object)[0,]
newobj@pp <- with(object@pp,
new("merPredD",
Lambdat=Lambdat,
Lind=Lind,
theta=theta,
#delu=delu,
u=u,u0=u0,
n=nrow(X),
X=matrix(1,nrow=nrow(X)),
Zt=Zt)) ## .sparseDiagonal(n,shape="g")))
newobj@resp <- new("glmResp",family=binomial(),y=numeric(0))
return(newobj)
}(评论已经足够了,但我没有足够的声誉)
https://stackoverflow.com/questions/31359909
复制相似问题