我刚刚写了一个小函数来下载一些图像并保存到我的硬盘上。现在一些urls重定向和/或包含错误的文件扩展名。我已经添加了一些验证,但是,它们会导致脚本立即停止,因为它们命中了一个错误的url。现在,我想修改一下脚本,使循环继续丢弃任何错误的urls,最终在我成功下载图像时中断循环。(在这里,我只需要成功下载一个镜像)。你能看一下我的代码并分享一些技巧吗?谢谢
from pattern.web import URL, DOM, plaintext, extension
import requests, re, os, sys, datetime, time, re, random
def download_single_image(query, folder, image_options=None):
download_fault = 0
url_link = None
valid_image_ext_list = ['.png', '.jpg', '.gif', '.bmp', '.tiff', 'jpeg'] # not comprehensive
pic_links = scrape_links(query, image_options) # pic_links contains an array of urls
for url in pic_links:
url = URL(url)
print "checking re-direction"
if url.redirect:
print "redirected, returning"
return # if there is a redirect, return
file_ext = extension(url.page)
print "checking file extension", file_ext
if file_ext.lower() not in valid_image_ext_list:
print "not a valid extension, returning"
return # return if not valid image extension found
# Download the image.
print('Downloading image %s... ' % (pic))
res = requests.get(pic)
try:
res.raise_for_status()
except Exception as exc:
print('There was a problem: %s' % (exc))
print ('Saving image to %s...'% (folder))
if not os.path.exists(folder + '/' + os.path.basename(pic)):
imageFile = open(os.path.join(folder, os.path.basename(pic)), mode='wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
print('pic saved %s' % os.path.basename(pic))
else:
print('File already exists!')
return os.path.basename(pic)发布于 2016-02-07 23:17:24
更改此设置:
return # return if not valid image extension found
要这样做:
continue # return if not valid image extension found
First只是中止循环,second跳到下一步。
互联网世界中的PS.File扩展毫无意义...我宁愿只发送带有CURL的HEAD请求,以检查它是否是图像(通过服务器返回的content-type )。
https://stackoverflow.com/questions/35255167
复制相似问题