对于一个基于agent的建模项目,我正在考虑使用tidyverse的tibble而不是matrix。我用一个非常简单的ABM (见下文)检查了两者的性能,其中我模拟了一个人口,其中每个人都在衰老、死亡和出生。对于ABM,我通常使用for循环和索引。
在对这两种数据结构进行基准测试时(请参阅此处的图表:https://github.com/marcosmolla/tibble_vs_matrix),矩阵比tibble快得多。然而,对于10e6运行,这个结果实际上是颠倒的。我也不知道为什么。
如果能理解这个结果,以便将来在这种用例中是否应该使用tibble或矩阵,那就太好了。
感谢所有人的任何意见!

# This code benchmarks the speed of tibbles versus matrices. This should be useful for evaluating the suitability of tibbles in a ABM context where matrix data is frequently altered in matrices (or vectors).
library(tidyverse)
library(reshape2)
library(cowplot)
lapply(c(10^1, 10^2, 10^3, 10^4, 10^5, 10^6), function(runtime){
# Set up tibble
indTBL <- tibble(id=1:100,
type=sample(1:3, size=100, replace=T),
age=1)
# Set up matrix (from tibble)
indMAT <- as.matrix(indTBL)
# Simulation run with tibble
t <- Sys.time()
for(i in 1:runtime){
# increase age
indTBL$age <- indTBL[["age"]]+1
# replace individuals by chance or when max age
dead <- (1:100)[runif(n=100,min=0,max=1)<=0.01 | indTBL[["age"]]>100]
indTBL[dead, "age"] <- 1
indTBL[dead, "type"] <- sample(1:3, size=length(dead), replace=T)
}
tibbleTime <- as.numeric(Sys.time()-t)
# Simulation run with matrix
t <- Sys.time()
for(i in 1:runtime){
# increase age
indMAT[,"age"] <- indMAT[,"age"]+1
# replace individuals by chance or when max age
dead <- (1:100)[runif(n=100,min=0,max=1)<=0.01 | indMAT[,"age"]>100]
indMAT[dead, "age"] <- 1
indMAT[dead, "type"] <- sample(1:3, size=length(dead), replace=T)
}
matrixTime <- as.numeric(Sys.time()-t)
# Return both run times
return(data.frame(tibbleTime=tibbleTime, matrixTime=matrixTime))
}) %>% bind_rows() -> res
# Prepare data for ggplot
res$power <- 1:nrow(res)
res_m <- melt(data=res, id.vars="power")
# Line plot for results
ggplot(data=res_m, aes(x=power, y=value, color=variable)) + geom_point() + geom_line() + scale_color_brewer(palette="Paired") + ylab("Runtime in sec") + xlab(bquote("Simulation runs"~10^x))发布于 2019-05-06 23:38:58
感谢你们的回复。我使用了microbenchmark包来正确地进行基准测试。现在,我发现对于10e6运行,矩阵仍然更快。
indTBL <- tibble(id=1:100,
type=sample(1:3, size=100, replace=T),
age=1)
# Set up matrix (from tibble)
indMAT <- as.matrix(indTBL)
# Simulation run with tibble
runtime <- 10^6
microbenchmark(
tib=for(i in 1:runtime){
# increase age
indTBL$age <- indTBL[["age"]]+1
# replace individuals by chance or when max age
dead <- (1:100)[runif(n=100,min=0,max=1)<=0.01 | indTBL[["age"]]>100]
indTBL[dead, "age"] <- 1
indTBL[dead, "type"] <- sample(1:3, size=length(dead), replace=T)
},
# Simulation run with matrix
mat=for(i in 1:runtime){
# increase age
indMAT[,"age"] <- indMAT[,"age"]+1
# replace individuals by chance or when max age
dead <- (1:100)[runif(n=100,min=0,max=1)<=0.01 | indMAT[,"age"]>100]
indMAT[dead, "age"] <- 1
indMAT[dead, "type"] <- sample(1:3, size=length(dead), replace=T)
}, times=1
)结果是
Unit: seconds
expr min lq mean median uq max neval cld
tib 80.22042 81.45051 82.26645 82.68061 83.28946 83.89831 3 b
mat 20.44746 20.66974 20.75168 20.89202 20.90378 20.91555 3 a 感谢Ilrs和MrFlick的提示。
https://stackoverflow.com/questions/56007967
复制相似问题