文章/答案/技术大牛

发布

社区首页 >问答首页 >当使用_details方法单击链接时，Scrubyt会产生404错误

问当使用_details方法单击链接时，Scrubyt会产生404错误
EN

Stack Overflow用户

提问于 2008-10-04 14:17:43

回答 4查看 366关注 0票数 1

这可能与我前面的两个问题类似--参见here和here，但我尝试使用_detail命令自动单击链接，以便为每个单独的事件抓取详细信息页。

我使用的代码是：

require 'rubygems'
require 'scrubyt'

nuffield_data = Scrubyt::Extractor.define do
  fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'

  event do
    title 'The Coast of Mayo'
    link_url
    event_detail do
      dates "1-4 October"
      times "7:30pm"
    end
  end

  next_page "Next Page", :limit => 20
end

  nuffield_data.to_xml.write($stdout,1)

有没有任何方法可以打印出使用event_detail访问的URL？错误似乎没有给我提供404的URL。

更新：--我认为链接可能是一个相对链接--这会导致问题吗？有什么办法处理这个问题吗？

ruby

scrubyt

回答 4

Stack Overflow用户

回答已采纳

发布于 2009-10-15 13:02:06

我对相关链接也有同样的问题，就这样修正了它.您必须将:解析param设置为正确的基url。

  event do
    title 'The Coast of Mayo'
    link_url
    event_detail :resolve => 'http://www.nuffieldtheatre.co.uk/cn/events' do
      dates "1-4 October"
      times "7:30pm"
    end
  end

票数 1

Stack Overflow用户

发布于 2008-10-04 22:56:48

    sudo gem install ruby-debug

This will give you access to a nice ruby debugger, start the debugger by altering your script:

    require 'rubygems'
    require 'ruby-debug'
    Debugger.start
    Debugger.settings[:autoeval] = true if Debugger.respond_to?(:settings)

    require 'scrubyt'

    nuffield_data = Scrubyt::Extractor.define do
      fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'

      event do
        title 'The Coast of Mayo'
        link_url
        event_detail do
          dates "1-4 October"
          times "7:30pm"
        end
      end

      next_page "Next Page", :limit => 2

    end

    nuffield_data.to_xml.write($stdout,1)

Then find out where scrubyt is throwing an exception - in this case:

    /Library/Ruby/Gems/1.8/gems/scrubyt-0.3.4/lib/scrubyt/core/navigation/fetch_action.rb:52:in `fetch'

Find the scrubyt gem on your system, and add a rescue clause to the method in question so that the end of the method looks like this:

      if @@current_doc_protocol == 'file'
        @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(open(@@current_doc_url).read))
      else
        @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc.body))
        store_host_name(self.get_current_doc_url)   # in case we're on a new host
      end
    rescue
      debugger
      self # the self is here because debugger doesn't like being at the end of a method
    end

现在再次运行该脚本，在引发异常时，您应该被放到调试器中。只需尝试在调试提示符中键入以下内容，以查看什么是违规的URL：

@@current_doc_url

您还可以在该方法中的任何地方添加调试器语句，如果您想检查正在发生什么--例如，您可能希望在该方法的第51行和第52行之间添加一个语句，以检查调用的url是如何变化的，以及原因。

这基本上就是我对你之前问题的答案。

祝好运。

票数 1

Stack Overflow用户

发布于 2008-10-05 20:19:40

对不起，我不知道为什么这是零--每次我运行它时，它都返回一个url -- self.fetch方法需要一个URL，您应该可以作为局部变量doc_url访问这个URL。如果返回0，也可以将包含调试器调用的代码发布到其中。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/170405

复制

相似问题

问当使用_details方法单击链接时，Scrubyt会产生404错误
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当使用_details方法单击链接时，Scrubyt会产生404错误EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当使用_details方法单击链接时，Scrubyt会产生404错误
EN