首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >我应该如何重构它以获得更好的性能

我应该如何重构它以获得更好的性能
EN

Stack Overflow用户
提问于 2015-12-19 23:34:37
回答 2查看 97关注 0票数 0

我使用此函数来获取地图中存在的关键字的分数。散列映射大约有80,000个条目。键都是字符串的向量。作为键传递给score函数的向量大约是22,但是该函数在调用函数的单次迭代中被调用了64000次。每次迭代大约需要1282.044314毫秒。

这是非常慢的,因为调用算法(用于标记的全局线性模型)需要进行数百万次迭代。我怎样才能解决这个瓶颈?

代码语言:javascript
复制
1. (defn score[map keys]
2.  (loop [acc 0 keys (seq keys)]
3.     (if keys
4.     (recur
5.         (+ (map (first keys) 0)
6.          acc)
7.       (next keys))
8.     acc)))


9. (defn yy1
10. "Where 'model' is a hash-map of vectors 
11.       e.g ["BIGRAM:POS-1:WORD:POS", "DT", "Man", "NNP"],
12.  'sent' is a vector of strings e.g ["The", "man", "came"],
13.  'i' an  integer, 
14.  't' a string e.g "NNP", 
15.  't1' also a string e.g "VBD",
16.  'null' is a string "NULL",
17.  'f's e.g 'f9' are strings like "BIGRAM:POS-2:WORD:POS""
18.
19.   [model sent i t t1]
20.   (let [word-1 (get sent (- i 1) null)
21.          word (sent i)
22.          word+1 (get sent (+ i 1) null)
23.          word+2 (get sent (+ i 2) null)
24.          word-2 (get sent (- i 2) null)
25.          features  [[f9  t1      word     t]
26.                     [f10 t1      t]
27.                     [f11 word-1  t1       word]
28.                     [f12 word-1  t1       word     t]
29.                     [f13 word-2  t1       word     t]
30.                     [f14 word-2  t1       t]
31.                     [f15 word    t]
32.                     [f16 word-1  word     t]
33.                     [f17 word-1  t]
34.                     [f18 word-2  word]
35.                     [f19 t       word+1]
36.                     [f20 word    t        word+1]
37.                     [f21 word-2  word-1   t]
38.                     [f22 word-2  word-1   word     t]
39.                     [f23 word    t        word+1   word+2]
40.         ]]
41.      (score model features)
42. ))


43. (defn yy1y2[model sent i t t1 t2]
44.    (let [word-1 (get sent (- i 1) null)
45.          word-2 (get sent (- i 2) null)
46.          word (sent i)
47.          features [[ f1 t2  word   t]
48.                    [ f2 t2  t]
49.                    [ f3 t2  t1     t]
50.                    [ f4 t2  t1     word    t]
51.                    [ f5 t2  t1     word-1  word    t]
52.                    [ f6 t2  word-2  t1     word-1  word  t]
53.                    [ f7 t2  word-1  word    t]
54.                    [ f8 t2  word-1  t]]]
55.     (score model features)
56.  ))

57. (defn viterbi
58.   "'model' is a hash-map e.g {["UNIGRAM:WORD:POS" "John" "NNP"] 1}
59.   'tags' is a hash-map, with keys :U :V :T e.g {:U ["NNP" "NN"]} 
60.   'sent' is a vector of strings e.g ["Jonh" "is" "tall"]"
61.                                                       
62.   [model tags sent]
63.   (let [pi (atom {[-1 * *] 1})]
64.     (let [{:keys [U V T]} tags
65.         N (range (count sent))
66.        ]
67.       (doall (for [k N u U v V]
68.         (let [const (yy1 model sent k v u)
69.               g #(yy1y2 model sent k v u %)
70.               k-1 (- k 1)
71.               [score t] (apply max-key first
72.                        (map
73.                        (fn[t] [(+ (@pi [k-1 t u] ep) const (g t))  t])
74.                        T))
75.              ]
76.           (swap! pi assoc (with-meta [k u v] {:t t}) score)))))
77.     @pi))
EN

回答 2

Stack Overflow用户

发布于 2015-12-20 01:15:52

我能够通过类型提示减少几个循环。

链接到我使用的输入:input

代码语言:javascript
复制
(defn new-score [map keys]
  (loop [acc 0 keys (seq keys)]
    (if keys
      (recur
        ^long (+ ^long (map (first keys) 0)
                 acc)
       (next keys))
     acc)))

criterium显示了以下结果:

代码语言:javascript
复制
user=> (def used (take 100 (shuffle (keys input))))
#'user/used
user=> (score input used)
113821306187
user=> (new-score input used)
113821306187
user=> (crit/bench (score input used))
Evaluation count : 14114040 in 60 samples of 235234 calls.
             Execution time mean : 4.347602 µs
    Execution time std-deviation : 72.243110 ns
   Execution time lower quantile : 4.255827 µs ( 2.5%)
   Execution time upper quantile : 4.503442 µs (97.5%)
                   Overhead used : 1.841861 ns

Found 2 outliers in 60 samples (3.3333 %)
    low-severe   2 (3.3333 %)
 Variance from outliers : 6.2475 % Variance is slightly inflated by outliers
nil
user=> (crit/bench (new-score input used))
Evaluation count : 17347620 in 60 samples of 289127 calls.
             Execution time mean : 3.482177 µs
    Execution time std-deviation : 125.513670 ns
   Execution time lower quantile : 3.405872 µs ( 2.5%)
   Execution time upper quantile : 3.690737 µs (97.5%)
                   Overhead used : 1.841861 ns

Found 6 outliers in 60 samples (10.0000 %)
    low-severe   4 (6.6667 %)
    low-mild     2 (3.3333 %)
 Variance from outliers : 22.2420 % Variance is moderately inflated by outliers
nil
票数 1
EN

Stack Overflow用户

发布于 2015-12-22 10:39:14

这里真正的问题不是你的加法代码很慢,而是在hash-map中使用向量作为键是很慢的。

代码语言:javascript
复制
test.core> (let [test-k ["a" "b"]
                 test-m {test-k 5}]
             (crit/quick-bench (= (test-m test-k) 5)))
WARNING: Final GC required 47.01152815855813 % of runtime
Evaluation count : 4410912 in 6 samples of 735152 calls.
             Execution time mean : 138.320973 ns
    Execution time std-deviation : 4.615398 ns
   Execution time lower quantile : 132.946422 ns ( 2.5%)
   Execution time upper quantile : 144.346280 ns (97.5%)
                   Overhead used : 1.781294 ns
nil
test.core> (let [test-k "ab"
                 test-m {test-k 5}]
             (crit/quick-bench (= (test-m test-k) 5)))
WARNING: Final GC required 47.87422755471501 % of runtime
Evaluation count : 35007324 in 6 samples of 5834554 calls.
             Execution time mean : 15.469454 ns
    Execution time std-deviation : 0.187772 ns
   Execution time lower quantile : 15.237863 ns ( 2.5%)
   Execution time upper quantile : 15.748780 ns (97.5%)
                   Overhead used : 1.781294 ns

Found 2 outliers in 6 samples (33.3333 %)
    low-severe   1 (16.6667 %)
    low-mild     1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil

从上面可以看出,使用字符串作为hashmap的键大约比使用向量快一个数量级。如果您可以重构您的代码,使其不使用向量,而使用更快的散列/相等函数,那么在运行您的代码时,您将看到类似的性能提升。

不幸的是,在不知道如何创建数据的情况下,我不能很好地给出一个更高性能的键,它对所涉及的数据结构是有意义的。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34372405

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档