我有非常大的文件。每个文件几乎是2GB。因此,我想并行运行多个文件。我可以这样做,因为所有的文件都有相似的格式,因此,文件读取可以并行进行。我知道我应该使用多进程库,但是我真的很困惑如何与我的代码一起使用它。
我的文件读取代码是:
def file_reading(file,num_of_sample,segsites,positions,snp_matrix):
with open(file,buffering=2000009999) as f:
###I read file here. I am not putting that code here.
try:
assert len(snp_matrix) == len(positions)
return positions,snp_matrix ## return statement
except:
print('length of snp matrix and length of position vector not the same.')
sys.exit(1)我的主要职能是:
if __name__ == "__main__":
segsites = []
positions = []
snp_matrix = []
path_to_directory = '/dataset/example/'
extension = '*.msOut'
num_of_samples = 162
filename = glob.glob(path_to_directory+extension)
###How can I use multiprocessing with function file_reading
number_of_workers = 10
x,y,z = [],[],[]
array_of_number_tuple = [(filename[file], segsites,positions,snp_matrix) for file in range(len(filename))]
with multiprocessing.Pool(number_of_workers) as p:
pos,snp = p.map(file_reading,array_of_number_tuple)
x.extend(pos)
y.extend(snp)因此,我对该函数的输入如下:
函数返回最后的位置列表和snp_matrix列表。如何在参数为列表和整数的情况下使用多处理?我使用多重处理的方式给出了以下错误:
TypeError: file_reading()缺少3个必需的位置参数:'segsites','snp_matrix'
发布于 2019-04-30 09:38:00
传递给Pool.map的列表中的元素不会自动解压缩。一般情况下,在“file_reading”函数中只能有一个参数。
当然,这个论点可以是元组,所以自己打开它是没有问题的:
def file_reading(args):
file, num_of_sample, segsites, positions, snp_matrix = args
with open(file,buffering=2000009999) as f:
###I read file here. I am not putting that code here.
try:
assert len(snp_matrix) == len(positions)
return positions,snp_matrix ## return statement
except:
print('length of snp matrix and length of position vector not the same.')
sys.exit(1)
if __name__ == "__main__":
segsites = []
positions = []
snp_matrix = []
path_to_directory = '/dataset/example/'
extension = '*.msOut'
num_of_samples = 162
filename = glob.glob(path_to_directory+extension)
number_of_workers = 10
x,y,z = [],[],[]
array_of_number_tuple = [(filename[file], num_of_samples, segsites,positions,snp_matrix) for file in range(len(filename))]
with multiprocessing.Pool(number_of_workers) as p:
pos,snp = p.map(file_reading,array_of_number_tuple)
x.extend(pos)
y.extend(snp)https://stackoverflow.com/questions/55905444
复制相似问题