文章/答案/技术大牛

发布

社区首页 >问答首页 >Python:使用bs4解析UNICODE字符

问Python:使用bs4解析UNICODE字符
EN

Stack Overflow用户

提问于 2016-01-05 18:50:34

回答 1查看 1.6K关注 0票数 1

我正在使用bs4构建一个python3网络爬虫/爬行器。每当遇到像中文符号这样的UNICODE代码字符时，程序就会崩溃。如何修改我的刮刀，使其支持UNICODE？

代码如下：

import urllib.request
from bs4 import BeautifulSoup

def crawlForData(url):
        r = urllib.request.urlopen(url)
        soup = BeautifulSoup(r.read(),'html.parser')
        result = [i.text.replace('\n', ' ').strip() for i in soup.find_all('p')]
        for p in result:
                print(p)

url = 'https://en.wikipedia.org/wiki/Adivasi'
crawlForData(url)

unicode

beautifulsoup

python

回答 1

Stack Overflow用户

发布于 2016-01-05 18:54:45

您可以尝试使用unicode()方法。它解码unicode字符串。

或者一种方法是

content.decode('utf-8','ignore')

其中content是您的字符串

完整的解决方案可能是：

html = urllib2.urlopen("your url")
content = html.read().decode('utf-8', 'ignore')
soup = BeautifulSoup(content)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/34609877

复制

相似问题

问Python:使用bs4解析UNICODE字符
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:使用bs4解析UNICODE字符EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:使用bs4解析UNICODE字符
EN