TL/DR: ThreadPoolExecutor是其原因。Memory usage with concurrent.futures.ThreadPoolExecutor in Python3
这里有一个Python脚本(简化了很多),它运行所有到所有的路由算法,在这个过程中它消耗了所有的内存。
我了解到问题在于主函数不返回,并且在其中创建的对象没有被垃圾收集器清除。
我的主要问题是:是否可以为返回的生成器编写一个使用者,以便清理数据?或者我应该直接打电话给垃圾收集器实用程序?
# thread pool executor like in python documentation example
def table_process(callable, total):
with ThreadPoolExecutor(max_workers=threads) as e:
future_map = {
e.submit(callable, i): i
for i in range(total)
}
for future in as_completed(future_map):
if future.exception() is None:
yield future.result()
else:
raise future.exception()
@argh.dispatch_command
def main():
threads = 10
data = pd.DataFrame(...) # about 12K rows
# this function routes only one slice of sources/destinations
def _process_chunk(x:int) -> gpd.GeoDataFrame:
# slicing is more complex, but simplified here for presentation
# do cross-product and an http request to process the result
result_df = _do_process(grid[x], grid)
return result_df
# writing to geopackage
with fiona.open('/tmp/some_file.gpkg', 'w', driver='GPKG', schema=...) as f:
for results_df in table_process(_process_chunk, len(data)):
aggregated_df = results_df.groupby('...').aggregate({...})
f.writerecords(aggregated_df)发布于 2018-12-28 18:57:03
原来是ThreadPoolExecutor保留了工作人员,不释放内存。
解决方案如下:Memory usage with concurrent.futures.ThreadPoolExecutor in Python3
https://stackoverflow.com/questions/53951005
复制相似问题