我正试图像这样解析HTML页面
# coding: utf8
[...]
def search(self, a, b):
word = self.champ_rech_canal.get_text()
url_canal = "http://www.canalplus.fr/pid3330-c-recherche.html?rechercherSite=" + mot_canal
try:
f = urllib.urlopen(url_canal)
self.feuille_canal = f.read()
f.close()
except:
self.champ_rech_canal.set_text("La recherche a échoué")
pass
print self.feuille_canal结果是好的,我也有�作为"é“或”o“,我如何解码它呢?试过:
self.feuille_canal = self.feuille_canal.decode("utf-8")结果:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 8789: invalid continuation byte发布于 2014-05-30 09:16:00
您正在尝试将ISO-8859-1页解码为UTF-8,但无法工作.请参阅返回的HTML中的内容标题:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />https://stackoverflow.com/questions/23950955
复制相似问题