我试着做一些网络抓取,但WWW:机械化宝石似乎不喜欢编码和崩溃。
post请求导致302重定向(到目前为止,这种重定向是机械化的),结果页面似乎使它崩溃。我搜索了很多,但是到目前为止还没有找到如何解决这个问题的方法。你们有什么想法吗?
代码:
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
answer = agent.post('https://www.budget.de/de/reservierung/privatkunden/step1/schnellbuchung',
{"Country" => "Deutschland",
"Abholstation" => "Aalen",
"Abgabestation" => "Aalen",
"Abholdatum" => "26.02.2009",
"Abholzeit_stunde" => "13",
"Abholzeit_minute" => "30",
"Abgabedatum" => "28.02.2009",
"Abgabezeit_stunde" => "13",
"Abgabezeit_minute" => "30",
"CountryID" => "DE",
"AbholstationID"=>"AA1",
"AbgabestationID"=>"AA1"
}
)
puts answer.body错误:
D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/util.rb:29:in `iconv': "\204nderungen vorbe"... (Iconv::IllegalSequence)
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/util.rb:29:in `to_native_charset'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/response_header_handler.rb:29:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain.rb:30:in `pass'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/handler.rb:6:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/response_body_parser.rb:35:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain.rb:30:in `pass'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/handler.rb:6:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/pre_connect_hook.rb:14:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain.rb:25:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize.rb:494:in `fetch_page'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize.rb:545:in `fetch_page'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize.rb:403:in `post_form'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize.rb:322:in `post'
from test.rb:7发布于 2009-02-25 15:12:55
该页面无疑是UTF-8,但是机械使用NKF (核心Ruby库)来猜测编码,出于某种原因,它被称为Shift JIS。解决这个问题的最快方法是覆盖机械化的编码映射,这样当它试图使用Iconv将主体转换为UTF-8时,它也会以UTF-8的形式传递源编码。你可以这样做:
WWW::Mechanize::Util::CODE_DIC[:SJIS] = "UTF-8"将其放在require机械化库的行后面。您可能希望立即将值设置回原来的位置,或者更好的是,找到问题的根源,并在必要时提交修补程序。
注意:我解决这个问题的方法是通过使用回溯来调试机械库。to_native_charset方法调用detect_charset,这正是问题所在。
发布于 2013-02-04 15:42:45
在我的例子中,get方法返回了一个Mechanize::File,它根本不使用编码。
我能够通过使用Iconv手动转换来修复它,但是只有在您已经知道编码的情况下,这才能起作用。
result = @agent.get uri
# Mechanize::File instead of Mechanize::Page is returned
# so we have to convert manually
result = Iconv.conv("utf-8", "iso-8859-1", result.body)https://stackoverflow.com/questions/586163
复制相似问题