下面的代码删除了站点的不同部分。基本上,我对每个部分都有几个方法,然后是一个方法(parse_details),它将所有散列合并为一个散列:
class Parser
def parse_details(html)
merged_hashes = {}
array_of_hashes = [
self.parse_department(html),
self.parse_super_saver(html),
]
array_of_hashes.inject(merged_hashes,:update)
return merged_hashes
end
def parse_department(file)
html = file
data = Nokogiri::HTML(open(html))
department = data.css('#ref_2619534011')
@department_hash = {}
department.css('li').drop(1).each do | department |
department_title = department.css('.refinementLink').text
department_count = department.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
@department_hash[:department] ||= {}
@department_hash[:department]["Pet Supplies"] ||= {}
@department_hash[:department]["Pet Supplies"][department_title] = department_count
end
return @department_hash
end
def parse_super_saver(file)
html = file
data = Nokogiri::HTML(open(html))
super_saver = data.css('#ref_2661623011')
@super_saver_hash = {}
super_saver.css('li').each do | super_saver |
super_saver_title = super_saver.css('.refinementLink').text
super_saver_count = super_saver.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
@super_saver_hash[:super_saver] ||= {}
@super_saver_hash[:super_saver][super_saver_title] = super_saver_count
end
return @super_saver_hash
end如您所见,我不止一次调用Nokogiri::HTML(open(html))。
有人建议我这样做:
def self.parse(html)
doc = Nokogiri::HTML html
self.parse_details(doc) unless doc.nil?
end所以我只调用Nokogiri::HTML一次。
但是我被困住了,例如,我不知道如何处理像department = data.css('#ref_2619534011')这样的部分,它们是否应该进入新的parse方法?我也不知道如何处理html和file参数。有了新的parse方法后,我应该保留它们还是删除它们?
有什么建议可以让我完成我想要的东西吗?
发布于 2013-08-21 13:14:28
class Parser
def initialize(url)
@data = Nokogiri.HTML(open(url))
end
def parse_details()
{}.tap do |merged_hashes|
array_of_hashes = [
parse_department(),
parse_super_saver(),
]
array_of_hashes.inject(merged_hashes,:update)
end
end
def parse_department()
department = @data.css('#ref_2619534011')
@department_hash = {}
department.css('li').drop(1).each do | department |
department_title = department.css('.refinementLink').text
department_count = department.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
@department_hash[:department] ||= {}
@department_hash[:department]["Pet Supplies"] ||= {}
@department_hash[:department]["Pet Supplies"][department_title] = department_count
end
@department_hash
end
def parse_super_saver()
super_saver = @data.css('#ref_2661623011')
@super_saver_hash = {}
super_saver.css('li').each do | super_saver |
super_saver_title = super_saver.css('.refinementLink').text
super_saver_count = super_saver.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
@super_saver_hash[:super_saver] ||= {}
@super_saver_hash[:super_saver][super_saver_title] = super_saver_count
end
@super_saver_hash
end
end如果您实际上不需要@department_hash和@super_saver_hash作为实例变量,则可以选择将它们转换为我在parse_details中使用的tap样式。
如果你实际上根本不需要它是一个类,而仅仅是一个方法的集合,那么考虑一下:
module Parser
def self.parse_details(url)
data = Nokogiri.HTML(open(url))
{}.tap do |merged_hashes|
array_of_hashes = [
parse_department(data),
parse_super_saver(data),
]
array_of_hashes.inject(merged_hashes,:update)
end
end
def self.parse_department(data)
{}.tap do |department_hash|
data.css('#ref_2619534011 li').drop(1).each do | department |
department_title = department.css('.refinementLink').text
department_count = department.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
department_hash[:department] ||= {}
department_hash[:department]["Pet Supplies"] ||= {}
department_hash[:department]["Pet Supplies"][department_title] = department_count
end
end
end
def self.parse_super_saver(data)
{}.tap do |super_saver_hash|
data.css('#ref_2661623011 li').each do |super_saver|
super_saver_title = super_saver.css('.refinementLink').text
super_saver_count = super_saver.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
super_saver_hash[:super_saver] ||= {}
super_saver_hash[:super_saver][super_saver_title] = super_saver_count
end
end
endhttps://stackoverflow.com/questions/18348760
复制相似问题