Python 3.5
见代码
import urllib.request
from xml.etree import ElementTree as ET
url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
def conectar(url):
page = urllib.request.urlopen(url)
return page.read()
root = ET.fromstring(conectar(url))
s = root.findall("//*[contains(.,'21/')]")需要解压缩'21/',但返回此错误:
Erro:
Traceback (most recent call last):
File "crawler.py", line 11, in <module>
root = ET.fromstring(conectar(url))
File "/home/rg3915/.pyenv/versions/3.5.0/lib/python3.5/xml/etree/ElementTree.py", line 1321, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: unbound prefix: line 146, column 8但我不知道如何解决这个错误。
发布于 2015-12-23 22:50:34
你可以从以下方面开始:
import urllib2
from bs4 import BeautifulSoup
url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
response = urllib2.urlopen(url)
html = response.read()
dom = BeautifulSoup(html, 'html.parser')
tables = dom.find_all("table")
if len(tables):
table = tables[0]
print table(用python 2.7测试)
发布于 2015-12-22 15:01:15
当您试图解析的文档声明为xhtml时,由于未绑定前缀,它是无效的xml。
<gcse:search></gcse:search>未为文档定义gcse ns前缀。
BeautifulSoup可能更适合您想要做的事情,因为它不会对文档100%的有效性感到费心。
https://stackoverflow.com/questions/34417831
复制相似问题