首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >数据刮取多个页面单击循环

数据刮取多个页面单击循环
EN

Stack Overflow用户
提问于 2016-06-07 13:52:14
回答 1查看 251关注 0票数 1

试图找出一种方法来使用一种机制来抓取和添加我们想要的UCAS网站上的所有数据。目前,我们正在为机械化的链接点击编码而挣扎。想知道是否有人可以帮助,有三个连续的链接点击循环中的进展,通过所有的搜索结果页面。显示大学所有课程的第一个链接是div类morecourseslink。

显示课程名称、持续时间和质量的第二个链接位于div类领域。

第三个链接显示在div过程中,id是coursedetailtab_entryreqs。

目前,我们正在使用下面的代码来抓取未命名的名称:

代码语言:javascript
复制
class PagesController < ApplicationController
  def home


require 'mechanize'
mechanize = Mechanize.new

@uninames_array = []

   page = mechanize.get('http://search.ucas.com/search/providers?CountryCode=3&RegionCode=&Lat=&Lng=&Feather=&Vac=2&Query=&ProviderQuery=&AcpId=&Location=scotland&IsFeatherProcessed=True&SubjectCode=&AvailableIn=2016')


page.search('li.result h3').each do |h3|
  name = h3.text
  @uninames_array.push(name)
end

while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

  page.search('li.result h3').each do |h3|
    name = h3.text
    @uninames_array.push(name)
  end
end

puts @uninames_array.to_s
  end
end

课程名称、期限和资格如下:

代码语言:javascript
复制
require 'mechanize'


mechanize = Mechanize.new
@duration_array = []
@qual_array = []
@courses_array = []

page = mechanize.get('http://search.ucas.com/search/results?Vac=2&AvailableIn=2016&IsFeatherProcessed=True&page=1&providerids=41')


page.search('div.courseinfoduration').each do |x|
puts x.text.strip
page.search('div.courseinfooutcome').each do |y|
puts y.text.strip

end

while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.courseinfoduration').each do |x|
    name = x
    @duration_array.push(name)
    puts x.text.strip
  end
end
while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.courseinfooutcome').each do |y|
    name = y
    @qual_array.push(name)
    puts y.text.strip
  end
end
page.search('div.coursenamearea h4').each do |h4|
puts h4.text.strip

end

while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.coursenamearea h4').each do |h4|
    name = h4.text
    @courses_array.push(name)
    puts h4.text.strip
  end
end
end
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-06-07 16:02:22

如果你想用一个机械化实例来完成这个任务,为什么不直接将它们串在一起并存储你需要在变量中跳转的页面呢?

如果您的所有代码都有效,那么您可以简单地将它们串到一个方法调用中:

代码语言:javascript
复制
def home


  require 'mechanize'
  mechanize = Mechanize.new

  @uninames_array = []

  page = mechanize.get('http://search.ucas.com/search/providers?CountryCode=3&RegionCode=&Lat=&Lng=&Feather=&Vac=2&Query=&ProviderQuery=&AcpId=&Location=scotland&IsFeatherProcessed=True&SubjectCode=&AvailableIn=2016')


  page.search('li.result h3').each do |h3|
    name = h3.text
    @uninames_array.push(name)
  end

  while next_page_link = page.at('.pager a[text()=">"]')
    page = mechanize.get(next_page_link['href'])

    page.search('li.result h3').each do |h3|
      name = h3.text
      @uninames_array.push(name)
    end
  end


@duration_array = []
@qual_array = []
@courses_array = []

page = mechanize.get('http://search.ucas.com/search/results?Vac=2&AvailableIn=2016&IsFeatherProcessed=True&page=1&providerids=41')


page.search('div.courseinfoduration').each do |x|
puts x.text.strip
page.search('div.courseinfooutcome').each do |y|
puts y.text.strip

end

while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.courseinfoduration').each do |x|
    name = x
    @duration_array.push(name)
    puts x.text.strip
  end
end
while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.courseinfooutcome').each do |y|
    name = y
    @qual_array.push(name)
    puts y.text.strip
  end
end
page.search('div.coursenamearea h4').each do |h4|
puts h4.text.strip

end

while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.coursenamearea h4').each do |h4|
    name = h4.text
    @courses_array.push(name)
    puts h4.text.strip
  end
end
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/37681359

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档