这是一个解析一些网站的程序。第一个站点是site1。解析永久站点的所有逻辑都位于(->配置:站点1)
(ns program.core
(require [net.cgrand.enlive-html :as html]))
(def config
{:site1
{:site-url
["http://www.site1.com/page/1"
"http://www.site1.com/page/2"
"http://www.site1.com/page/3"
"http://www.site1.com/page/4"]
:url-encoding "iso-8859-1"
:parsing-index
{:date
{:selector
[[:td.PadMed (html/nth-of-type 1)] :table [:tr (html/nth-of-type 2)]
[:td (html/nth-of-type 3)] [:span]]
:trimming-fn
(comp first :content) ; (first) to remove extra parenthese
}
:title
{:selector
[[:td.PadMed (html/nth-of-type 1)] :table :tr [:td (html/nth-of-type 2)] [:a]]
:trimming-fn
(comp first :content first :content)
}
:url
{:selector
[[:td.PadMed (html/nth-of-type 1)] :table :tr [:td (html/nth-of-type 2)] [:a]]
:trimming-fn
#(str "http://www.site.com" (:href (:attrs %)))
}
}
}})
;=== Fetch fn ===;
(defn fetch-encoded-url
([url] (fetch-encoded-url url "utf-8"))
([url encoding] (-> url java.net.URL.
.getContent
(java.io.InputStreamReader. encoding)
html/html-resource)))现在我想解析(-> config :site1 :site- for )中包含的页面,在这个例子中我只使用了第一个url,但是我如何设计它来真正地为所有的url做一个主url呢?
(defn parse-element [element]
(into [] (map (-> config :site1 :parsing-index element :trimming-fn)
(html/select
(fetch-encoded-url
(-> config :site1 :site-url first)
(-> config :site1 :url-encoding))
(-> config :site1 :parsing-index element :selector)))))
(def element-lists
(apply map vector
(map parse-element (-> config :site1 :parsing-index keys))))
(def tagged-lists
(into [] (for [element-list element-lists]
(zipmap [:date :title :url] element-list))))
;==== Fn call ====
(println tagged-lists)发布于 2013-05-06 21:52:24
将:site1作为参数传递给parse-element和elements-list。
(defn parse-element [site element]
(into [] (map (-> config site :parsing-index element :trimming-fn)
(html/select
(fetch-encoded-url
(-> config site :site-url first)
(-> config site :url-encoding))
(-> config site :parsing-index element :selector)))))
(def element-lists [site]
(apply map vector
(map (partial parse-element site) (-> config site :parsing-index keys))))然后通过:site1 :site2…进行映射钥匙。
附录回答了评论中的进一步问题。
您可以将html/select包装在:site-urls上的map中。类似于:
(defn parse-element [site element]
(let [site-urls (-> config site :site-url)]
(into [] (map (-> config site :parsing-index element :trimming-fn)
map
#(html/select
(fetch-encoded-url
%
(-> config site :url-encoding))
(-> config site :parsing-index element :selector)))
site-urls)))(我希望我的括号是正确的。)
然后,您可能需要检查:trimming-fn,以便它能够处理嵌套。一个apply应该就足够了。
https://stackoverflow.com/questions/16387016
复制相似问题