我正在使用parallelStream并行上传一些文件,有些是大文件,有些是小文件。我注意到并不是所有的工人都被使用了。
起初,所有线程都运行得很好(我将parallelism选项设置为16)。然后在某个时间点(一旦它到达更大的文件),它只使用一个线程
简化代码:
files.parallelStream().forEach((file) -> {
try (FileInputStream fileInputStream = new FileInputStream(file)) {
IDocumentStorageAdaptor uploader = null;
try {
logger.debug("Adaptors before taking: " + uploaderPool.size());
uploader = uploaderPool.take();
logger.debug("Took an adaptor!");
logger.debug("Adaptors after taking: " + uploaderPool.size());
uploader.addNewFile(file);
} finally {
if (uploader != null) {
logger.debug("Adding one back!");
uploaderPool.put(uploader);
logger.debug("Adaptors after putting: " + uploaderPool.size());
}
}
} catch (InterruptedException | IOException e) {
throw new UploadException(e);
}
});uploaderPool是一个ArrayBlockingQueue。日志:
[ForkJoinPool.commonPool-worker-8] - Adaptors before taking: 0
[ForkJoinPool.commonPool-worker-15] - Adding one back!
[ForkJoinPool.commonPool-worker-8] - Took an adaptor!
[ForkJoinPool.commonPool-worker-15] - Adaptors after putting: 0
...
...
...
[ForkJoinPool.commonPool-worker-10] - Adding one back!
[ForkJoinPool.commonPool-worker-10] - Adaptors after putting: 16
[ForkJoinPool.commonPool-worker-10] - Adaptors before taking: 16
[ForkJoinPool.commonPool-worker-10] - Took an adaptor!
[ForkJoinPool.commonPool-worker-10] - Adaptors after taking: 15
[ForkJoinPool.commonPool-worker-10] - Adding one back!
[ForkJoinPool.commonPool-worker-10] - Adaptors after putting: 16
[ForkJoinPool.commonPool-worker-10] - Adaptors before taking: 16
[ForkJoinPool.commonPool-worker-10] - Took an adaptor!
[ForkJoinPool.commonPool-worker-10] - Adaptors after taking: 15似乎所有的工作(列表中的项目)都分布在16个线程中,委托给一个线程的事情只会等待线程空闲工作,而不是使用可用的线程。有没有办法改变parallelStream的工作排队方式?我读过forkjoinpool文档,它提到了工作窃取,但只针对产生的子任务。
我的另一个计划也许是对我正在使用parallelStream的列表进行随机化排序,这样可能会平衡一些事情。
谢谢!
发布于 2018-09-05 02:55:46
并行流的拆分与计算启发式是针对数据并行操作进行调整的,而不是针对IO并行操作进行调整的。(换句话说,它们的调整是为了让CPU保持繁忙,而不是为了生成比CPU多得多的任务。)因此,他们偏向于计算而不是forking。目前还没有覆盖这些选择的选项。
https://stackoverflow.com/questions/52172342
复制相似问题