使用mecahnize 2.7.3和ruby 2.3.0dev运行以下代码:
require 'mechanize'
agent = Mechanize.new
agent.keep_alive = false
agent.open_timeout = 2
agent.read_timeout = 2
agent.ignore_bad_chunking = true
agent.gzip_enabled = false
url = 'http:%5C%5Cwww.scouts.org.uk'
agent.head(url)这就给了我这个NoMethodError:
~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:648:in resolve': undefined
methodlength' for nil:NilClass (NoMethodError)
from ~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:223:in `fetch'
from ~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize.rb:459:in `head这是机械化中的一个bug,还是我做错了什么?如果是这样,如何修复呢?
编辑: url显然是错误的,但我从一个文件中读取了很多url,其中一些可能是错误的。
EDIT2:假设我有一个像这样的http://pastie.org/9934756文件,我需要获取所有正确urls的头部,而忽略其他urls
发布于 2015-02-10 18:31:45
你写错了网址,试试这个:url = 'http://scouts.org.uk'
发布于 2015-02-10 19:28:03
您的目标站点正在执行重定向,并使用元刷新。更新您的代码以包含这些方法:
require 'mechanize'
agent = Mechanize.new
agent.keep_alive = false
agent.follow_meta_refresh = true
agent.redirect_ok = true
agent.open_timeout = 10
agent.read_timeout = 10
agent.ignore_bad_chunking = true
agent.gzip_enabled = false
url = 'http:%5C%5Cwww.scouts.org.uk'
begin
page_head = agent.head(url)
rescue Exception => exception
puts "Caught exception: #{exception.message}"
end结果:
=> #Caught exception: undefined method `length' for nil:NilClass发布于 2015-02-10 20:46:52
您可以添加此方法来检查url是否有效:
require 'uri'
def valid?(url)
uri = URI.parse(url)
if uri.kind_of?(URI::HTTP) == true
puts '+'
else
puts '-'
end
rescue URI::InvalidURIError
puts 'false '
end
['http://web.de',
'http://web.de/',
'http:%5c%5cweb.de',
'http:web.de',
'foo://web.de',
'http://we b.de',
'http://|web.de'].each { |i|
valid?(i)
}+
+
+
+
错误
错误的
https://stackoverflow.com/questions/28429204
复制相似问题