对于下面的HTML,我想使用Nokogiri解析它并得到以下结果。
event_name = "folk concert 2"
event_link = "http://www.douban.com/event/12761580/"
event_date = "20th,11,2010"我知道doc.xpath('//div[@class="nof clearfix"]')可以获得每个div元素,但是我应该如何继续获得像event_name这样的每个属性,尤其是date
HTML
<div class="nof clearfix">
<h2><a href="http://www.douban.com/event/12761580/">folk concert 2</a> <span class="pl2"> </span></h2>
<div class="pl intro">
Date:25th,11,2010<br/>
</div>
</div>
<div class="nof clearfix">
<h2><a href="http://www.douban.com/event/12761581/">folk concert </a> <span class="pl2"> </span></h2>
<div class="pl intro">
Date:10th,11,2010<br/>
</div>
</div>发布于 2010-11-20 19:22:51
我不知道xpath,我更喜欢使用css选择器,它们对我来说更有意义。This tutorial可能对您有用。
require 'rubygems'
require 'nokogiri'
require 'pp'
Event = Struct.new :name , :link , :date
doc = Nokogiri::HTML DATA
events = doc.css("div.nof.clearfix").map do |eventnode|
name = eventnode.at_css("h2 a").text.strip
link = eventnode.at_css("h2 a")['href']
date = eventnode.at_css("div.pl.intro").text.strip
Event.new name , link , date
end
pp events
__END__
<div class="nof clearfix">
<h2><a href="http://www.douban.com/event/12761580/">folk concert 2</a> <span class="pl2"> </span></h2>
<div class="pl intro">
Date: 25th,11,2010<br/>
</div>
</div>
<div class="nof clearfix">
<h2><a href="http://www.douban.com/event/12761581/">folk concert </a> <span class="pl2"> </span></h2>
<div class="pl intro">
Date: 10th,11,2010<br/>
</div>
</div>这将输出以下内容:
[#<struct Event
name="folk concert 2",
link="http://www.douban.com/event/12761580/",
date="Date: 25th,11,2010">,
#<struct Event
name="folk concert",
link="http://www.douban.com/event/12761581/",
date="Date: 10th,11,2010">]https://stackoverflow.com/questions/4232345
复制相似问题