文章/答案/技术大牛

发布

社区首页 >问答首页 >BeautifulSoup无法使用“html5lib”解析html

问BeautifulSoup无法使用“html5lib”解析html
EN

Stack Overflow用户

提问于 2015-12-25 13:49:28

回答 1查看 2.9K关注 0票数 1

BeautifulSoup无法使用选项html5lib解析html页面，但通常使用选项html.parser。根据文档，html5lib应该比html.parser更宽容，那么为什么我在使用它来解析html页面时遇到了混乱的代码呢？

下面是一个小的可执行示例。(在用html5lib更改html.parser之后，中文输出是正常的。)

#_*_coding:utf-8_*_
import requests
from bs4 import BeautifulSoup

ss = requests.Session()
res = ss.get("http://tech.qq.com/a/20151225/050487.htm")
html = res.content.decode("GBK").encode("utf-8")
soup = BeautifulSoup(html, 'html5lib')
print str(soup)[0:800]  # where you can see if the html is parsed normally or not

python

parsing

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-12-25 14:34:45

不要对你的内容进行重新编码。把处理解码的工作留给“美丽汤”：

soup = BeautifulSoup(res.content, 'html5lib')

如果要重新编码，则需要替换源代码中存在的meta头：

<meta http-equiv="Content-Type" content="text/html; charset=gb2312">

或者在Unicode中手动解码和传递：

soup = BeautifulSoup(res.content.decode('gbk'), 'html5lib')

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/34463416

复制

相似问题

问BeautifulSoup无法使用“html5lib”解析html
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup无法使用“html5lib”解析htmlEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup无法使用“html5lib”解析html
EN