当我运行下面的代码时,我得到了一个mechanize._html.ParseError异常。我怎么才能让它闭嘴?我知道它是无效的html,如果它是一个很好的网站,我不会想要解析它。我用谷歌搜索了一下,有人告诉我用br = mechanize.Browser(factory=mechanize.RobustFactory())取代br = mechanize.Browser(),但这行不通。
import mechanize
#br = mechanize.Browser()
br = mechanize.Browser(factory=mechanize.RobustFactory())
br.set_handle_robots(False)
br.open("http://journeyplanner.irishrail.ie/bin/query.exe")
for form in br.forms():
print form
print发布于 2012-01-18 02:32:38
为什么要用mechanize打开.exe文件?你应该用它来打开网页。如果要下载.exe文件,请改用br.retrieve()。
编辑:
顺便说一句,您的代码为我生成了以下输出:
<formular POST http://journeyplanner.irishrail.ie/bin/query.exe/dn?ld=1.1&OK#focus application/x-www-form-urlencoded
<HiddenControl(queryPageDisplayed=yes) (readonly)>
<HiddenControl(HWAI=JS!ajax=yes) (disabled, readonly)>
<HiddenControl(HWAI=JS!js=yes) (disabled, readonly)>
<HiddenControl(outwardConDetails=) (readonly)>
<ImageControl(start=Verbindung suchen)>
<TextControl(REQ0JourneyStopsS0A=255)>
<TextControl(REQ0JourneyStopsS0G=)>
<HiddenControl(REQ0JourneyStopsS0ID=) (readonly)>
<TextControl(REQ0JourneyStopsZ0A=255)>
<TextControl(REQ0JourneyStopsZ0G=)>
<HiddenControl(REQ0JourneyStopsZ0ID=) (readonly)>
<RadioControl(journey_mode=[*single, return])>
<TextControl(REQ0JourneyDate=17/01/2012)>
<SelectControl(REQ0JourneyTime=[*0, 00, 9, 14, 18])>
<HiddenControl(REQ0HafasPeriodToSearch=1440) (readonly)>
<HiddenControl(REQ0HafasPeriodSearch=2) (readonly)>
<HiddenControl(REQ0HafasSearchForw=1) (readonly)>
<CheckboxControl(special_search_both=[1])>
<TextControl(REQ1JourneyDate=)>
<SelectControl(REQ1JourneyTime=[*0, 00, 9, 14, 18])>
<HiddenControl(REQ1HafasPeriodToSearch=1440) (readonly)>
<HiddenControl(REQ1HafasPeriodSearch=2) (readonly)>
<HiddenControl(REQ1HafasSearchForw=1) (readonly)>
<SubmitControl(start=Go) (readonly)>
<SubmitControl(start=Go) (readonly)>>编辑:
哦,我错了。它根本不是一个.exe文件。我下载了它并用文本编辑器打开,它只是一个.html文件!它也适用于br = mechanize.Browser()
https://stackoverflow.com/questions/8899748
复制相似问题