文章/答案/技术大牛

发布

社区首页 >问答首页 >Clojure与Numpy中的矩阵乘法

问Clojure与Numpy中的矩阵乘法
EN

Stack Overflow用户

提问于 2012-01-18 02:31:01

回答 9查看 15.5K关注 0票数 41

我正在用Clojure开发一个需要乘以大矩阵的应用程序，与相同的Numpy版本相比，它遇到了一些很大的性能问题。Numpy似乎能够在不到一秒的时间内将1,000,000x23矩阵与其转置相乘，而等效的clojure代码需要6分钟以上。(我可以打印出Numpy的结果矩阵，所以它肯定是在计算所有东西)。

在这段Clojure代码中，我是不是做错了什么？有什么Numpy的小把戏可以让我模仿一下吗？

这是一条巨蟒：

import numpy as np

def test_my_mult(n):
    A = np.random.rand(n*23).reshape(n,23)
    At = A.T

    t0 = time.time()
    res = np.dot(A.T, A)
    print time.time() - t0
    print np.shape(res)

    return res

# Example (returns a 23x23 matrix):
# >>> results = test_my_mult(1000000)
# 
# 0.906938076019
# (23, 23)

和clojure：

(defn feature-vec [n]
  (map (partial cons 1)
       (for [x (range n)]
         (take 22 (repeatedly rand)))))

(defn dot-product [x y]
  (reduce + (map * x y)))

(defn transpose
  "returns the transposition of a `coll` of vectors"
  [coll]
  (apply map vector coll))

(defn matrix-mult
  [mat1 mat2]
  (let [row-mult (fn [mat row]
                   (map (partial dot-product row)
                        (transpose mat)))]
    (map (partial row-mult mat2)
         mat1)))

(defn test-my-mult
  [n afn]
  (let [xs  (feature-vec n)
        xst (transpose xs)]
    (time (dorun (afn xst xs)))))

;; Example (yields a 23x23 matrix):
;; (test-my-mult 1000 i/mmult) => "Elapsed time: 32.626 msecs"
;; (test-my-mult 10000 i/mmult) => "Elapsed time: 628.841 msecs"

;; (test-my-mult 1000 matrix-mult) => "Elapsed time: 14.748 msecs"
;; (test-my-mult 10000 matrix-mult) => "Elapsed time: 434.128 msecs"
;; (test-my-mult 1000000 matrix-mult) => "Elapsed time: 375751.999 msecs"


;; Test from wikipedia
;; (def A [[14 9 3] [2 11 15] [0 12 17] [5 2 3]])
;; (def B [[12 25] [9 10] [8 5]])

;; user> (matrix-mult A B)
;; ((273 455) (243 235) (244 205) (102 160))

更新:我使用JBLAS库实现了相同的基准测试，发现速度有了很大的提高。感谢每个人的投入！是时候用Clojure包装这个笨蛋了。这是新的代码：

(import '[org.jblas FloatMatrix])

(defn feature-vec [n]
  (FloatMatrix.
   (into-array (for [x (range n)]
                 (float-array (cons 1 (take 22 (repeatedly rand))))))))

(defn test-mult [n]
  (let [xs  (feature-vec n)
        xst (.transpose xs)]
    (time (let [result (.mmul xst xs)]
            [(.rows result)
             (.columns result)]))))

;; user> (test-mult 10000)
;; "Elapsed time: 6.99 msecs"
;; [23 23]

;; user> (test-mult 100000)
;; "Elapsed time: 43.88 msecs"
;; [23 23]

;; user> (test-mult 1000000)
;; "Elapsed time: 383.439 msecs"
;; [23 23]

(defn matrix-stream [rows cols]
  (repeatedly #(FloatMatrix/randn rows cols)))

(defn square-benchmark
  "Times the multiplication of a square matrix."
  [n]
  (let [[a b c] (matrix-stream n n)]
    (time (.mmuli a b c))
    nil))

;; forma.matrix.jblas> (square-benchmark 10)
;; "Elapsed time: 0.113 msecs"
;; nil
;; forma.matrix.jblas> (square-benchmark 100)
;; "Elapsed time: 0.548 msecs"
;; nil
;; forma.matrix.jblas> (square-benchmark 1000)
;; "Elapsed time: 107.555 msecs"
;; nil
;; forma.matrix.jblas> (square-benchmark 2000)
;; "Elapsed time: 793.022 msecs"
;; nil

python

matrix

numpy

clojure

回答 9

Stack Overflow用户

回答已采纳

发布于 2012-01-18 03:04:20

Python版本正在向下编译到C中的循环，而Clojure版本正在为要在此代码中映射的每个调用构建一个新的中间序列。您看到的性能差异很可能来自于数据结构的差异。

要获得更好的效果，您可以尝试使用像Incanter这样的库，或者按照this SO question中的说明编写您自己的版本。另请参阅this one、neanderthal或nd4j。如果您真的想继续使用序列来保持惰性求值属性等，那么可以通过查看内部矩阵计算的transients来获得真正的提升

编辑:忘记添加调优clojure的第一步，打开“反射时警告”

票数 32

Stack Overflow用户

发布于 2012-01-18 04:05:29

Numpy链接到BLAS/Lapack例程，这些例程在机器架构层面已经优化了几十年，而Clojure是以最直接和最幼稚的方式实现乘法。

任何时候，只要您有非平凡的矩阵/向量运算要执行，就应该链接到BLAS/LAPACK。

唯一不会更快的情况是，对于来自语言的小矩阵，在语言运行时和LAPACK之间转换数据表示的开销超过了执行计算的时间。

票数 27

Stack Overflow用户

发布于 2012-01-19 09:26:48

我刚刚在Incanter 1.3和jBLAS 1.2.1之间上演了一场小型的枪战。代码如下：

(ns ml-class.experiments.mmult
  [:use [incanter core]]
  [:import [org.jblas DoubleMatrix]])

(defn -main [m]
  (let [n 23 m (Integer/parseInt m)
        ai (matrix (vec (double-array (* m n) (repeatedly rand))) n)
        ab (DoubleMatrix/rand m n)
        ti (copy (trans ai))
        tb (.transpose ab)]
    (dotimes [i 20]
      (print "Incanter: ") (time (mmult ti ai))
      (print "   jBLAS: ") (time (.mmul tb ab)))))

在我的测试中，在纯矩阵乘法中，Incanter的速度一直比jBLAS慢约45%。然而，Incanter trans函数不会创建矩阵的新副本，因此jBLAS中的(.mmul (.transpose ab) ab)占用的内存是Incanter中的两倍，并且仅比Incanter中的(mmult (trans ai) ai)快15%。

考虑到Incanter丰富的功能集(尤其是它的绘图库)，我认为我不会很快切换到jBLAS。尽管如此，我仍然希望看到jBLAS和并行柯尔特之间的另一场枪战，也许在Incanter中用jBLAS取代并行柯尔特是值得考虑的？:-)

编辑：这里是绝对数(毫秒)我在我的(相当慢的)PC上：

Incanter: 665.362452
   jBLAS: 459.311598
   numpy: 353.777885

对于每个库，我从20次运行中挑选出最佳时间，矩阵大小为23x400000。

PS。Haskell hmatrix的结果接近numpy，但我不确定如何正确地对其进行基准测试。

票数 14

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/8899773

复制

相似问题

问Clojure与Numpy中的矩阵乘法
EN

回答 9

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Clojure与Numpy中的矩阵乘法EN

回答 9

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Clojure与Numpy中的矩阵乘法
EN