首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在urllib中捕获404错误?(python 3)

如何在urllib中捕获404错误?(python 3)
EN

Stack Overflow用户
提问于 2013-07-20 20:14:47
回答 2查看 16K关注 0票数 9

我已经阅读了数十个类似问题的例子,但我无法获得我所见过的任何解决方案或它们的变体来运行。我正在进行屏幕抓取,我只想忽略404错误(跳过页面)。我得到了

'AttributeError:‘模块’对象没有属性'HTTPError‘。

我也试过“URLError”。我看到了几乎相同的语法被接受为有效的答案。有什么想法吗?我要说的是:

代码语言:javascript
复制
import urllib
import datetime
from bs4 import BeautifulSoup

class EarningsAnnouncement:
    def __init__(self, Company, Ticker, EPSEst, AnnouncementDate, AnnouncementTime):
        self.Company = Company
        self.Ticker = Ticker
        self.EPSEst = EPSEst
        self.AnnouncementDate = AnnouncementDate
        self.AnnouncementTime = AnnouncementTime

webBaseStr = 'http://biz.yahoo.com/research/earncal/'
earningsAnnouncements = []
dayVar = datetime.date.today()
for dte in range(1, 30):
    currDay = str(dayVar.day)
    currMonth = str(dayVar.month)
    currYear = str(dayVar.year)
    if (len(currDay)==1): currDay = '0' + currDay
    if (len(currMonth)==1): currMonth = '0' + currMonth
    dateStr = currYear + currMonth + currDay
    webString = webBaseStr + dateStr + '.html'
    try:
        #with urllib.request.urlopen(webString) as url: page = url.read()
        page = urllib.request.urlopen(webString).read()
        soup = BeautifulSoup(page)
        tbls = soup.findAll('table')
        tbl6= tbls[6]
        rows = tbl6.findAll('tr')
        rows = rows[2:len(rows)-1]
        for earn in rows:
            earningsAnnouncements.append(EarningsAnnouncement(earn.contents[0], earn.contents[1],
            earn.contents[3], dateStr, earn.contents[3]))
    except urllib.HTTPError as err:
        if err.code == 404:
            continue
        else:
            raise

    dayVar += datetime.timedelta(days=1)
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2013-07-20 20:22:39

对于urllib (不是urllib2)来说,例外是urllib.error.HTTPError,而不是urllib.HTTPError。有关更多信息,请参见文档

票数 18
EN

Stack Overflow用户

发布于 2022-10-06 21:34:28

这样做:

代码语言:javascript
复制
import urllib.error# import 
except urllib.error.URLError as e:# use 'urllib.error.URLError' and not 'urllib.HTTPError'
        print ('Error code: ', e.code)# or what ever u want 
        return e.code
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/17766300

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档