首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >继续处理Python中的异常

继续处理Python中的异常
EN

Stack Overflow用户
提问于 2016-10-19 15:41:43
回答 1查看 232关注 0票数 0

我正在编写一系列脚本,这些脚本从数据库中提取URL,并使用textstat包根据一组预定义的计算计算页面的可读性。下面的函数获取一个url (来自CouchDB),计算定义的可读性分数,然后将分数保存回同一个CouchDB文档。

我遇到的问题是错误处理。例如,Flesch阅读轻松分数计算需要对页面上的句子总数进行计数。如果返回为零,则会引发异常。是否有办法捕获此异常,在数据库中保存异常记录,然后转到列表中的下一个URL?我可以在下面的函数(首选)中这样做吗?或者我需要编辑包本身吗?

我知道以前有人问过这个问题的变体。如果你知道一个人可能回答我的问题,请指出我的方向。到目前为止,我的搜索一直没有结果。提前谢谢。

代码语言:javascript
复制
def get_readability_data(db, url, doc_id, rank, index):
    readability_data = {}
    readability_data['url'] = url
    readability_data['rank'] = rank
    user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    headers = { 'User-Agent' : user_agent }
    try:
        req = urllib.request.Request(url)
        response = urllib.request.urlopen(req)
        content = response.read()
        readable_article = Document(content).summary()
        soup = BeautifulSoup(readable_article, "lxml")
        text = soup.body.get_text()
        try:
            readability_data['flesch_reading_ease'] = textstat.flesch_reading_ease(text)
            readability_data['smog_index'] = textstat.smog_index(text)
            readability_data['flesch_kincaid_grade'] = textstat.flesch_kincaid_grade(text)
            readability_data['coleman_liau'] = textstat.coleman_liau_index(text)
            readability_data['automated_readability_index'] = textstat.automated_readability_index(text)
            readability_data['dale_chall_score'] = textstat.dale_chall_readability_score(text)
            readability_data['linear_write_formula'] = textstat.linsear_write_formula(text)
            readability_data['gunning_fog'] = textstat.gunning_fog(text)
            readability_data['total_words'] = textstat.lexicon_count(text)
            readability_data['difficult_words'] = textstat.difficult_words(text)
            readability_data['syllables'] = textstat.syllable_count(text)
            readability_data['sentences'] = textstat.sentence_count(text)
            readability_data['readability_consensus'] = textstat.text_standard(text)
            readability_data['readability_scores_date'] = time.strftime("%a %b %d %H:%M:%S %Y")

            # use the doc_id to make sure we're saving this in the appropriate place
            readability = json.dumps(readability_data, sort_keys=True, indent=4 * ' ')
            doc = db.get(doc_id)
            data = json.loads(readability)
            doc['search_details']['search_details'][index]['readability'] = data
            #print(doc['search_details']['search_details'][index])
            db.save(doc)
            time.sleep(.5)

        except: # catch *all* exceptions
            e = sys.exc_info()[0]
            write_to_page( "<p>---ERROR---: %s</p>" % e )

    except urllib.error.HTTPError as err:
        print(err.code)

这是我收到的错误:

代码语言:javascript
复制
Error(ASL): Sentence Count is Zero, Cannot Divide
Error(ASyPW): Number of words are zero, cannot divide
Traceback (most recent call last):
  File "new_get_readability.py", line 114, in get_readability_data
    readability_data['flesch_reading_ease'] = textstat.flesch_reading_ease(text)
  File "/Users/jrs/anaconda/lib/python3.5/site-packages/textstat/textstat.py", line 118, in flesch_reading_ease
    FRE = 206.835 - float(1.015 * ASL) - float(84.6 * ASW)
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

这是调用函数的代码:

代码语言:javascript
复制
if __name__ == '__main__':
    db = connect_to_db(parse_args())
    print("~~~~~~~~~~" + " GETTING IDs " + "~~~~~~~~~~")
    ids = get_ids(db)
    for i in ids:
        details = get_urls(db, i)
        for d in details:
            get_readability_data(db, d['url'], d['id'], d['rank'], d['index'])
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-10-19 16:16:30

通常,保持try: except:块尽可能小是很好的做法。我会将您的textstat函数封装在某种类型的装饰器中,它捕获您期望的异常,并返回函数输出和捕获的异常。

例如:

代码语言:javascript
复制
def catchExceptions(exception):  #decorator with args (sorta boilerplate)
    def decorator(func):
        def wrapper(*args, **kwargs):
            try:
                retval = func(*args, **kwargs)
            except exception as e:
                return None, e
            else:
                return retval, None
        return wrapper
    return decorator

@catchExceptions(ZeroDivisionError)
def testfunc(x):
    return 11/x

print testfunc(0)
print '-----'
print testfunc(3)

指纹:

代码语言:javascript
复制
(None, ZeroDivisionError('integer division or modulo by zero',))
-----
(3, None)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/40136016

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档