我有一个从urls下载图像的脚本,但我想并行化,否则需要几个小时。使用此代码:
import requests
from math import floor, log10
import urllib
import time
import multiprocessing
with open('images.csv', 'r') as f:
images = f.readlines()
num_position = floor(log10(len(images)) + 1)
a = time.time()
for i, image in enumerate(images[1:10]):
if (i+1) % 1000 == 0:
print('Downloading {} image'.format(i+1) )
# a = time.time()
with open(str(i).zfill(num_position)+'a.jpg', 'wb') as file:
try:
writing = file.write(requests.get(image.split(',')[2]).content)
p = multiprocessing.Process(target=writing, args=(image,))
p.start()
p.join()
except:
print('Skipping an image!')
pass
b = time.time()
print('multiple process -- {}'.format(b-a)) 我收到一个错误:
Process Process-9:
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: 'int' object is not callable发布于 2016-12-05 06:54:53
您得到了错误,因为AFAIK这一行
writing = file.write(requests.get(image.split(',')[2]).content)具有整数类型的输出。write返回与图像的字符串表示长度相等的书写字符数。现在将其分配给变量writing -> writing成为一个数字。
p = multiprocessing.Process(target=writing, args=(image,))调用writing作为目标函数,这将引发错误,因为您不是在调用函数,而是调用整数类型的writing (不可调用)。代码可以工作,因为您的工作人员没有任何要做的事情,并且立即关闭,并且文件已经写入。
为了工作,您必须定义一个函数,它以您的图像为参数,或者以文件名为参数。稍后在设置员工时调用此函数。就像这样:
def write_file(image, filename):
filestream = open(filename, mode="w")
filestream.write(requests.get(image.split(',')[2]).content)
filestream.close()在你的申请中
p = multiprocessing.Process(target=write_file, args=(image, filename,)) 然而,这只是写作的一部分。如果您也想在单独的任务中完成下载,那么您必须将其代码放入单独的函数中。
def download_write(urls):
for url in iter(urls.get, 'STOP'):
#download code here#
filestream = open(filename, mode="w")
filestream.write(requests.get(image.split(',')[2]).content)
filestream.close()你的主要申请是:
list_urls = [] # your list of urls to download
urls = Queue()
for element in list_urls:
urls.put(element)
p = multiprocessing.Process(target=download_write, args=(urls,))
urls.put("STOP") #signals end of tasks for your workers
p.start() #start worker
p.join() #wait for worker to finishhttps://stackoverflow.com/questions/40964432
复制相似问题