文章/答案/技术大牛

发布

社区首页 >问答首页 >如何处理python爬虫的urlopen错误？

问如何处理python爬虫的urlopen错误？
EN

Stack Overflow用户

提问于 2016-05-03 10:32:23

回答 2查看 164关注 0票数 0

当我编写python爬虫时，我经常使用urlopen。有时它无法打开url (所以我得到一个错误)，但当我重新尝试打开这个url时，它成功了。因此，我通过这样编写爬虫来处理这种情况：

def url_open(url):
'''open the url and return its content'''
req = urllib.request.Request(headers=header, url=url)
while True:
    try:
        response = urllib.request.urlopen(req)
        break
    except:
        continue
contents = response.read().decode('utf8')
return contents

我觉得这段代码很难看...但是它是有效的，那么有什么优雅的方法可以做到这一点吗？

python

web-crawler

urlopen

回答 2

Stack Overflow用户

发布于 2016-05-03 10:40:37

我强烈推荐使用requests库。你可能会遇到同样的问题，但我发现请求更容易处理，也更可靠。

同样的请求应该是这样的

def url_open(url):
    while True:
        try:
            response = requests.get(url, headers=header)
            break
       except:
            continue
return response.text

你得到了什么错误？

票数 0

Stack Overflow用户

发布于 2016-05-03 11:14:55

我建议继续使用requests API和Sessions和Adapters，这样您就可以显式地设置重试次数。这是更多的代码，但它绝对是更干净的：

import requests
session = requests.Session()
http_adapter = requests.adapters.HTTPAdapter(max_retries=3)
https_adapter = requests.adapters.HTTPAdapter(max_retries=3)
session.mount('http://', http_adapter)
session.mount('https://', https_adapter)
response = s.get(url)
if response.status_code != 200 then:
   # Handle the request failure here
   pass

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/36994653

复制

相似问题

问如何处理python爬虫的urlopen错误？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何处理python爬虫的urlopen错误？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何处理python爬虫的urlopen错误？
EN