我使用此函数来获取地图中存在的关键字的分数。散列映射大约有80,000个条目。键都是字符串的向量。作为键传递给score函数的向量大约是22,但是该函数在调用函数的单次迭代中被调用了64000次。每次迭代大约需要1282.044314毫秒。
这是非常慢的,因为调用算法(用于标记的全局线性模型)需要进行数百万次迭代。我怎样才能解决这个瓶颈?
1. (defn score[map keys]
2. (loop [acc 0 keys (seq keys)]
3. (if keys
4. (recur
5. (+ (map (first keys) 0)
6. acc)
7. (next keys))
8. acc)))
9. (defn yy1
10. "Where 'model' is a hash-map of vectors
11. e.g ["BIGRAM:POS-1:WORD:POS", "DT", "Man", "NNP"],
12. 'sent' is a vector of strings e.g ["The", "man", "came"],
13. 'i' an integer,
14. 't' a string e.g "NNP",
15. 't1' also a string e.g "VBD",
16. 'null' is a string "NULL",
17. 'f's e.g 'f9' are strings like "BIGRAM:POS-2:WORD:POS""
18.
19. [model sent i t t1]
20. (let [word-1 (get sent (- i 1) null)
21. word (sent i)
22. word+1 (get sent (+ i 1) null)
23. word+2 (get sent (+ i 2) null)
24. word-2 (get sent (- i 2) null)
25. features [[f9 t1 word t]
26. [f10 t1 t]
27. [f11 word-1 t1 word]
28. [f12 word-1 t1 word t]
29. [f13 word-2 t1 word t]
30. [f14 word-2 t1 t]
31. [f15 word t]
32. [f16 word-1 word t]
33. [f17 word-1 t]
34. [f18 word-2 word]
35. [f19 t word+1]
36. [f20 word t word+1]
37. [f21 word-2 word-1 t]
38. [f22 word-2 word-1 word t]
39. [f23 word t word+1 word+2]
40. ]]
41. (score model features)
42. ))
43. (defn yy1y2[model sent i t t1 t2]
44. (let [word-1 (get sent (- i 1) null)
45. word-2 (get sent (- i 2) null)
46. word (sent i)
47. features [[ f1 t2 word t]
48. [ f2 t2 t]
49. [ f3 t2 t1 t]
50. [ f4 t2 t1 word t]
51. [ f5 t2 t1 word-1 word t]
52. [ f6 t2 word-2 t1 word-1 word t]
53. [ f7 t2 word-1 word t]
54. [ f8 t2 word-1 t]]]
55. (score model features)
56. ))
57. (defn viterbi
58. "'model' is a hash-map e.g {["UNIGRAM:WORD:POS" "John" "NNP"] 1}
59. 'tags' is a hash-map, with keys :U :V :T e.g {:U ["NNP" "NN"]}
60. 'sent' is a vector of strings e.g ["Jonh" "is" "tall"]"
61.
62. [model tags sent]
63. (let [pi (atom {[-1 * *] 1})]
64. (let [{:keys [U V T]} tags
65. N (range (count sent))
66. ]
67. (doall (for [k N u U v V]
68. (let [const (yy1 model sent k v u)
69. g #(yy1y2 model sent k v u %)
70. k-1 (- k 1)
71. [score t] (apply max-key first
72. (map
73. (fn[t] [(+ (@pi [k-1 t u] ep) const (g t)) t])
74. T))
75. ]
76. (swap! pi assoc (with-meta [k u v] {:t t}) score)))))
77. @pi))发布于 2015-12-20 01:15:52
我能够通过类型提示减少几个循环。
链接到我使用的输入:input
(defn new-score [map keys]
(loop [acc 0 keys (seq keys)]
(if keys
(recur
^long (+ ^long (map (first keys) 0)
acc)
(next keys))
acc)))criterium显示了以下结果:
user=> (def used (take 100 (shuffle (keys input))))
#'user/used
user=> (score input used)
113821306187
user=> (new-score input used)
113821306187
user=> (crit/bench (score input used))
Evaluation count : 14114040 in 60 samples of 235234 calls.
Execution time mean : 4.347602 µs
Execution time std-deviation : 72.243110 ns
Execution time lower quantile : 4.255827 µs ( 2.5%)
Execution time upper quantile : 4.503442 µs (97.5%)
Overhead used : 1.841861 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 6.2475 % Variance is slightly inflated by outliers
nil
user=> (crit/bench (new-score input used))
Evaluation count : 17347620 in 60 samples of 289127 calls.
Execution time mean : 3.482177 µs
Execution time std-deviation : 125.513670 ns
Execution time lower quantile : 3.405872 µs ( 2.5%)
Execution time upper quantile : 3.690737 µs (97.5%)
Overhead used : 1.841861 ns
Found 6 outliers in 60 samples (10.0000 %)
low-severe 4 (6.6667 %)
low-mild 2 (3.3333 %)
Variance from outliers : 22.2420 % Variance is moderately inflated by outliers
nil发布于 2015-12-22 10:39:14
这里真正的问题不是你的加法代码很慢,而是在hash-map中使用向量作为键是很慢的。
test.core> (let [test-k ["a" "b"]
test-m {test-k 5}]
(crit/quick-bench (= (test-m test-k) 5)))
WARNING: Final GC required 47.01152815855813 % of runtime
Evaluation count : 4410912 in 6 samples of 735152 calls.
Execution time mean : 138.320973 ns
Execution time std-deviation : 4.615398 ns
Execution time lower quantile : 132.946422 ns ( 2.5%)
Execution time upper quantile : 144.346280 ns (97.5%)
Overhead used : 1.781294 ns
nil
test.core> (let [test-k "ab"
test-m {test-k 5}]
(crit/quick-bench (= (test-m test-k) 5)))
WARNING: Final GC required 47.87422755471501 % of runtime
Evaluation count : 35007324 in 6 samples of 5834554 calls.
Execution time mean : 15.469454 ns
Execution time std-deviation : 0.187772 ns
Execution time lower quantile : 15.237863 ns ( 2.5%)
Execution time upper quantile : 15.748780 ns (97.5%)
Overhead used : 1.781294 ns
Found 2 outliers in 6 samples (33.3333 %)
low-severe 1 (16.6667 %)
low-mild 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil从上面可以看出,使用字符串作为hashmap的键大约比使用向量快一个数量级。如果您可以重构您的代码,使其不使用向量,而使用更快的散列/相等函数,那么在运行您的代码时,您将看到类似的性能提升。
不幸的是,在不知道如何创建数据的情况下,我不能很好地给出一个更高性能的键,它对所涉及的数据结构是有意义的。
https://stackoverflow.com/questions/34372405
复制相似问题