我试图用不同的参数来拟合许多lightgbm模型(例如,用于参数调整)。需要并行运行它们以减少时间。但是,在运行%dopar%命令时,我会得到以下错误:Error in unserialize(socklist[[n]]) : error reading from connection。
xgboost (posted here)也出现了同样的问题,通过将xgboost数据类型xgb.DMatrix (训练和测试数据存储在%dopar%循环本身而不是外部)中解决了这个问题。但是,对于lightgbm,这也不起作用(例如,将本机lgb.Dataset训练和测试数据集放在循环中仍然会带来相同的错误)。对如何解决这个问题有什么想法吗?
编辑:我使用的是Windows10,R版本4.0.4 (2021-02-15),RStudio版本1.4.1106 Win64,lightgbm版本3.1.1。正如注释中提到的,该代码显然适用于Mac。
下面是一个可重复的例子。(我知道lightgbm通过nthread参数(就像xgboost)集成了自己的并行化,但是它在fit中使用并行化,这似乎没有给我带来太多的速度改进。)
#### Load packages
library(lightgbm)
library(parallel)
library(foreach)
library(doParallel)
#### Data Sim
n = 1000
X = cbind(runif(n,10,20), runif(n,0,10))
y = 10 + 2*X[,1] + 3*X[,2] + rnorm(n,0,1)
#### LGB - single (works)
train = lgb.Dataset(data = X[-nrow(X),], label = y[-nrow(X)])
test = lgb.Dataset(data = t(as.matrix(X[nrow(X),])), label = y[nrow(X)]) # 1 step-ahead only
valid = list(train = train, test = test)
model_lgb = lgb.train(data = train, valids = valid, max_depth = 31, eta = 0.1, num_rounds = 10000, obj = "regression", early_stopping_rounds = 25)
#### LGB - parallel (doesn't work)
numCores = detectCores()
cl = parallel::makeCluster(numCores)
doParallel::registerDoParallel(cl)
clusterEvalQ(cl, {
library(lightgbm)
})
pred_lgb = foreach(i = 1:8, .packages = c("lightgbm")) %dopar% {
train = lgb.Dataset(data = X[-nrow(X),], label = y[-nrow(X)])
test = lgb.Dataset(data = t(as.matrix(X[nrow(X),])), label = y[nrow(X)]) # 1 step-ahead only
valid = list(train = train, test = test)
model_lgb = lgb.train(data = train, valids = valid, max_depth = i, eta = 0.1, num_rounds = 10000, obj = "regression", early_stopping_rounds = 25)
}
stopCluster(cl)结果:
Error in unserialize(socklist[[n]]) : error reading from connection发布于 2021-04-04 13:19:31
今天我不小心又做了一个可重复的例子,它确实奏效了。在我的原始代码中,我在clusterEvalQ段中有额外的包,我认为在这里发布代码之前我已经删除了这些包(因为它们对于这个可重复的示例来说是多余的),而没有自己再次运行它--诚然,这将是一个明显的错误,但找不到任何其他解释。
因此,除了clusterEvalQ之外,通过lightgbm加载多个包似乎也存在问题。只将lightgbm留在clusterEvalQ中,并通过.packages函数在foreach循环中加载必要的附加包,如下所示:
#### LGB
numCores = detectCores()
cl = parallel::makeCluster(numCores)
doParallel::registerDoParallel(cl)
clusterEvalQ(cl, {
library(lightgbm)
# no additional packages here
})
# additional packages here
pred_lgb = foreach(i = 1:8, .packages = c("additionalPackage", "lightgbm")) %dopar% {
train = lgb.Dataset(data = X[-nrow(X),], label = y[-nrow(X)])
test = lgb.Dataset(data = t(as.matrix(X[nrow(X),])), label = y[nrow(X)]) # 1 step-ahead only
valid = list(train = train, test = test)
model_lgb = lgb.train(data = train, valids = valid, max_depth = i, eta = 0.1, num_rounds = 10000, obj = "regression", early_stopping_rounds = 25)$best_iter
}
stopCluster(cl)我仍然需要研究这两种装载方式的区别。请注意,xgboost不会出现此问题,即我可以在clusterEvalQ中加载多个包而不会出错。
https://stackoverflow.com/questions/66734140
复制相似问题