html5lib注意到它的最新版本(0.11)有点旧。使用Python部分,我遇到了Issue 70和Issue 59中提到的递归问题,但找不到最新的稳定Mercurial版本。
最新的提示并不好,我从python setup.py install得到了以下错误
byte-compiling build/bdist.linux-x86_64/egg/html5lib/treewalkers/_base.py to _base.pyc
File "build/bdist.linux-x86_64/egg/html5lib/treewalkers/_base.py", line 40
"data": []}
^
SyntaxError: invalid syntax在运行时,我得到了以下错误:
soup = parser.parse(page.read())
File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 165, in parse
File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 144, in _parse
File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 454, in processDoctype
TypeError: insertDoctype() takes exactly 4 arguments (2 given)我在Python2.5.2上通过lxml和BeautifulSoup使用它。
发布于 2010-12-06 19:30:54
从2010年1月开始,0.90版本看起来就是你想要的:
http://code.google.com/p/html5lib/downloads/list
https://stackoverflow.com/questions/1122494
复制相似问题