这是我第一次尝试使用lisp语言。我希望你审查一下已经按相关顺序列出的以下几点:
代码:
(ns clojure-first-try.core)
(use '[clojure.string :only [split]])
(use '[clojure.set :only [intersection]])
(defn sum [arr]
(reduce + 0.0 arr))
(defn squares [arr]
(map #(* % %) arr))
(defn words-frequency [text]
(as-> text $
(split $ #"\s+")
(filter (complement empty?) $)
(frequencies $)))
(defn cos-numerator [map-1 map-2]
(def common-keys (intersection (set (keys map-1)) (set (keys map-2))))
(sum (map #(* (map-1 %) (map-2 %)) common-keys)))
(defn cos-denominator [map-1 map-2]
(def p-1 (sum (squares (vals map-1))))
(def p-2 (sum (squares (vals map-2))))
(* (Math/sqrt p-1) (Math/sqrt p-2)))
(defn cos-similarity [freq-1 freq-2]
(def a (cos-numerator freq-1 freq-2))
(def b (cos-denominator freq-1 freq-2))
(if (= b 0.0) 0.0 (/ a b)))
(defn sort-by-similarity [base options]
(def base-freq (words-frequency base))
(def sorter (comp (partial cos-similarity base-freq) words-frequency))
(sort-by sorter > options))测试:
(ns clojure-first-try.core-test
(:require [clojure.test :refer :all]
[clojure-first-try.core :refer :all]))
(defn close-enough? [a b]
(def delta (- a b))
(def abs-delta (max delta (- delta)))
(> 0.0001 abs-delta))
(deftest words-frequency-test
(testing "Counting the frequency of words in a text with actual words"
(def expected {"a" 3, "bb" 2, "ccc" 4})
(def input "ccc ccc a bb a bb a ccc ccc")
(def result (words-frequency input))
(is (= expected result)))
(testing "Counting the frequency of words in an empty text"
(def expected {})
(def input "")
(def result (words-frequency input))
(is (= expected result))))
(deftest cos-similarity-test
(testing "Getting the similarity of two equal texts"
(def expected 1.0)
(def input-1 {:a 1 :b 2})
(def input-2 {:b 2 :a 1})
(def result (cos-similarity input-1 input-2))
(is (close-enough? expected result)))
(testing "Getting the similarity of two totally different texts"
(def expected 0.0)
(def input-1 {:a 1 :b 2})
(def input-2 {:c 2 :d 1})
(def result (cos-similarity input-1 input-2))
(is (close-enough? expected result)))
(testing "Getting the similarity of two half equal texts"
(def expected 0.5)
(def input-1 {:a 2 :b 2})
(def input-2 {:b 2 :d 2})
(def result (cos-similarity input-1 input-2))
(is (close-enough? expected result))))
(deftest sort-by-similarity-test
(testing "Sorting regular texts"
(def expected ["a b c" "a b d" "d e f"])
(def input-1 "a b c")
(def input-2 ["d e f" "a b c" "a b d"])
(def result (sort-by-similarity input-1 input-2))
(is (= expected result)))
(testing "Sorting empty list of options")
(def expected [])
(def input-1 "a b c")
(def input-2 [])
(def result (sort-by-similarity input-1 input-2))
(is (= expected result)))所有通过的测试:
lein test clojure-first-try.core-test
Ran 3 tests containing 7 assertions.
0 failures, 0 errors.发布于 2018-05-03 16:29:22
首先,代码中最严重的问题是:
永远不要在defn中使用D1,除非你真正知道自己在做什么。def创建的全局变量即使在函数返回之后也会持续:
(defn func []
(def x 1))
(func)
(println x) ; Prints 1! Ouch!唯一真正适合在本地使用def的时候是您打算创建全局的时候。这是一个非常罕见的案例。
相反,您应该使用let。例如:
(defn cos-denominator [map-1 map-2]
(let [p-1 (sum (squares (vals map-1)))
p-2 (sum (squares (vals map-2)))]
(* (Math/sqrt p-1) (Math/sqrt p-2))))let为它定义的数量创建了一个有限的范围。一旦您离开let,它们就会超出范围。
这是我在您的代码中看到的唯一完全错误的地方。我的其他评论将与最佳实践相关。
不应该真正直接使用use、require和相关函数。如果将它们作为ns宏的一部分,则要干净得多:
(ns clojure-first-try.core
(:use [clojure.string :only [split]]
[clojure.set :only [intersection]]))使用$作为标识符是很奇怪的。我不得不加倍地去那里。我建议使用一些描述性更强的方法。如果您想不出一个好的标识符可以使用,我认为即使是一个字母也比像$这样的奇怪符号要好。
(reduce + 0.0 arr))更典型地写为
(apply + arr)你的方式没有错,后者一般被认为更惯用。这是因为+有一个var-arg重载,这基本上是一个显式的缩减。
另外,您不需要指定0.0作为还原的起始累加器。+为您处理这个问题。
(reduce + []) ; 0(filter (complement empty?) arr)可以写成
(remove empty? arr)remove为你添加了complement!下面是core定义(基本上):
(defn remove [pred coll] ; Abridged defintion
(filter (complement pred) coll))没有对complement的显式调用,它读起来要好得多。
(= b 0.0)也可以写成:
(zero? b)虽然收益不是很大。我发现,在一个已经很忙的线路上,比如在检查一个数字的因素时,情况会更好。我发现:
(zero? (rem a b))比
(= 0 (rem a b))对于cos-numerator:
(intersection (set (keys map-1)) (set (keys map-2)))效率很低。您可以迭代这两个映射,以便将它们放在一个集合中以测试交叉点。如果没有更多的时间安排,就不清楚keys的复杂性是什么。如果它是O(n),则将每个映射迭代两次,然后再作为一个集合进行迭代。仅仅使用filter要简洁得多,而且速度要快得多:
(filter map-1 (keys map-2))映射在查找错误时返回nil,因此它们可以用作谓词来检查成员资格。
我也认为这可以用->>来清理。我把它改成:
(defn cos-numerator [map-1 map-2]
(->> (filter map-1 (keys map-2))
(map #(* (map-1 %) (map-2 %)))
(sum)))squares也可以编写为
(map * arr arr))你在某个地方会有一些重复。这取决于你想让它看起来什么样。我觉得这个有点整洁。
我非常喜欢本地匿名函数来清理代码。cos-denominator让你对这两个地图做完全相同的事情,然后拿出他们的产品。
我会将公共转换变成一个局部函数,然后使用它:
(defn cos-denominator [map-1 map-2]
(let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
(* (proc map-1) (proc map-2))))这样做的重复要少得多。您甚至可以有点过火,去掉对proc的重复调用:
(defn cos-denominator2 [map-1 map-2]
(let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
(apply * (map proc [map-1 map-2]))))这也可以使其与->>更加整洁:
(defn cos-denominator3 [map-1 map-2]
(let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
(->> [map-1 map-2]
(map proc)
(apply *))))在进行了上述更改之后,以及其他一些我认为更整洁的触摸之后,我最终得到了这样的结果:
(ns clojure-first-try.core
(:use [clojure.string :only [split]]
[clojure.set :only [intersection]]
[clojure.test :refer :all])
(:require [criterium.core :as c]))
(defn sum [arr]
(apply + arr))
(defn squares [arr]
(map * arr arr))
(defn words-frequency [text]
(as-> text t
(split t #"\s+")
(filter (complement empty?) t)
(frequencies t)))
(defn cos-numerator [map-1 map-2]
(->> (filter map-1 (keys map-2))
(map #(* (map-1 %) (map-2 %)))
(sum)))
(defn cos-denominator [map-1 map-2]
(let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
(->> [map-1 map-2]
(map proc)
(apply *))))
(defn cos-similarity [freq-1 freq-2]
(let [a (cos-numerator freq-1 freq-2)
b (cos-denominator freq-1 freq-2)]
(if (zero? b)
0.0
(/ a b))))
(defn sort-by-similarity [base options]
(let [base-freq (words-frequency base)
sorter (comp #(cos-similarity base-freq %) words-frequency)]
(sort-by sorter > options)))https://codereview.stackexchange.com/questions/193571
复制相似问题