我有一个程序可以使用关键字或关键字搜索google,这些关键字在运行该程序时被作为参数:
示例:pull_sites.rb "testing"返回这些站点>>>
https://en.wikipedia.org/wiki/Software_testing
http://en.wikipedia.org/wiki/Test_automation
http://www.istqb.org/about-istqb.html
http://softwaretestingfundamentals.com/test-plan/
https://en.wikipedia.org/wiki/Software_testing
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:9qU2GDLzZzEJ:https://en.wikipedia.org/wiki/Software_testing%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://en.wikipedia.org/wiki/Test_strategy
https://en.wikipedia.org/wiki/Category:Software_testing
https://en.wikipedia.org/wiki/Test_automation
https://en.wikipedia.org/wiki/Portal:Software_testing
https://en.wikipedia.org/wiki/Test
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:R94CAo00wOYJ:https://en.wikipedia.org/wiki/Test%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://en.wikipedia.org/wiki/Unit_testing
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:G9V8uRLkPjIJ:https://en.wikipedia.org/wiki/Unit_testing%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://testing.byu.edu/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:d9bGrCHr9fsJ:https://testing.byu.edu/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://www.test.com/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:S92tylTr1V8J:https://www.test.com/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://ddce.utexas.edu/disability/using-testing-accommodations/
http://blogs.vmware.com/virtualblocks/2015/07/06/vsan-vs-nutanix-head-to-head-performance-testing-part-4-exchange/
http://www.networkforgood.com/nonprofitblog/testing-101-4-steps-optimizing-your-fundraising-approach/
http://www.auslea.com/software-testing-training.html
http://academy.littletonpublicschools.net/Default.aspx%3Ftabid%3D12807%26articleType%3DArticleView%26articleId%3D2400
https://golang.org/pkg/testing/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:EALG7Jlm9eoJ:https://golang.org/pkg/testing/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://www.speedtest.net/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:M47_v0xF3m8J:http://www.speedtest.net/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:1sMSoJBXydoJ:https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://www.act.org/content/act/en/products-and-services/the-act/test-preparation.html
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:pAzlNJl3YY4J:http://www.act.org/content/act/en/products-and-services/the-act/test-preparation.html%252Btesting%26gbv%3D1%26%26ct%3Dclnk它能像预期的那样工作,但只会刮到谷歌的第一页,是否可以搜索,比如第1页-5页?
这是刮伤的来源:
def get_urls
puts "Searching...".green
agent = Mechanize.new
page = agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = "#{SEARCH}" #SEARCH is the parameter given when program is run
page = agent.submit(google_form, google_form.buttons.first)
page.links.each do |link|
if link.href.to_s =~/url.q/
str=link.href.to_s
strList=str.split(%r{=|&})
url=strList[1]
File.open("links.txt", "a+"){ |s| s.puts(url) }
end
end
end发布于 2016-03-08 23:50:19
好的,如果你正在使用谷歌铬或火狐,打开开发工具。这将帮助您识别要自动单击的链接。当你做谷歌搜索,然后滚动到底部,你会看到页面链接点击。使用浏览器中的开发工具,您需要识别google分配这些页码链接的类或id。然后使用机械化单击方法跟踪这些链接。例如,如果链接标记为"next“,则可以使用以下简单的方法:
page2 = page1.link_with(:text => "next").click
我是从我的电话应答,所以它可以节省你的时间谷歌“点击链接”与机械化,以了解更多的细节。
发布于 2016-03-09 01:08:53
这是一种简单得多的获取表单,只需自己提出请求:
https://www.google.com/search?q=foo
https://www.google.com/search?q=foo&start=10
https://www.google.com/search?q=foo&start=20https://stackoverflow.com/questions/35875304
复制相似问题