问图片下载脚本需要稍加修改
EN

Stack Overflow用户

提问于 2016-02-07 23:12:33

回答 1查看 28关注 0票数 0

我刚刚写了一个小函数来下载一些图像并保存到我的硬盘上。现在一些urls重定向和/或包含错误的文件扩展名。我已经添加了一些验证，但是，它们会导致脚本立即停止，因为它们命中了一个错误的url。现在，我想修改一下脚本，使循环继续丢弃任何错误的urls，最终在我成功下载图像时中断循环。(在这里，我只需要成功下载一个镜像)。你能看一下我的代码并分享一些技巧吗？谢谢

from pattern.web import URL, DOM, plaintext, extension
import requests, re, os, sys, datetime, time, re, random

def download_single_image(query, folder, image_options=None):

download_fault = 0
url_link = None
valid_image_ext_list = ['.png', '.jpg', '.gif', '.bmp', '.tiff', 'jpeg'] # not comprehensive
pic_links = scrape_links(query, image_options) # pic_links contains an array of urls
for url in pic_links:
    url = URL(url)

    print "checking re-direction"

    if url.redirect:
        print "redirected, returning"
        return # if there is a redirect, return

    file_ext = extension(url.page)
    print "checking file extension", file_ext

    if file_ext.lower() not in valid_image_ext_list:
        print "not a valid extension, returning"
        return # return if not valid image extension found

    # Download the image.
    print('Downloading image %s... ' % (pic))
    res = requests.get(pic)
    try:
        res.raise_for_status()
    except Exception as exc:
        print('There was a problem: %s' % (exc))

        print ('Saving image to %s...'% (folder))
        if not os.path.exists(folder + '/' + os.path.basename(pic)):
            imageFile = open(os.path.join(folder, os.path.basename(pic)), mode='wb')
            for chunk in res.iter_content(100000):
                imageFile.write(chunk)
                imageFile.close()
                print('pic saved %s' % os.path.basename(pic))

            else:
                print('File already exists!')

                return os.path.basename(pic)

python

回答 1

Stack Overflow用户

发布于 2016-02-07 23:17:24

更改此设置：

return # return if not valid image extension found

要这样做：

continue # return if not valid image extension found

First只是中止循环，second跳到下一步。

互联网世界中的PS.File扩展毫无意义...我宁愿只发送带有CURL的HEAD请求，以检查它是否是图像(通过服务器返回的content-type )。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/35255167

复制

相似问题

问图片下载脚本需要稍加修改
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问图片下载脚本需要稍加修改EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问图片下载脚本需要稍加修改
EN