文章/答案/技术大牛

发布

社区首页 >问答首页 >如何有效地压缩稀疏矩阵

问如何有效地压缩稀疏矩阵
EN

Stack Overflow用户

提问于 2022-02-22 15:09:43

回答 1查看 249关注 0票数 2

在Python中，对于我的应用程序来说，通常最好通过创建带有行、列和值数组的稀疏COO矩阵，然后将其更改为CSC (CSR)格式。

现在，假设我想压缩CSC矩阵。什么是这样做的有效方式？压缩行/列在代码中变化很大，比稀疏矩阵的维数小得多，因此我不认为重构COO矩阵是有效的。

下面的MWE演示了创建压缩矩阵的示例，但没有进行任何优化尝试。有一个稀疏的效率警告，因为非零的数目增加了。在这个MWE中，为了简单起见，我使用dia_array创建稀疏矩阵。

import numpy as np
from scipy.sparse import dia_array

def main():
    n = 10
    m = 6
    data = np.tile(np.concatenate((np.arange(1, m+1),
                                   np.arange(m-1, 0, -1)))[:, np.newaxis], (1, n))
    offsets = np.arange(-m+1, m)
    A = dia_array((data, offsets), shape=(n, n)).tocsc()
    print("Matrix A:")
    print(repr(A.toarray()))

    # condense these rows/columns
    cond_rowscols = np.arange(n-8, n, 2)
    print("Condensed rows/columns of A:")
    print(repr(cond_rowscols))

    # IMPROVE HERE
    # condensation algorithm
    B = A.copy()
    B[[cond_rowscols[0]]] += B[cond_rowscols[1:]].sum(axis=0)
    B[:, [cond_rowscols[0]]] += B[:, cond_rowscols[1:]].sum(axis=1)[:, np.newaxis]
    free_rowscols = np.ones(n, dtype=bool)
    free_rowscols[cond_rowscols[1:]] = False
    B = B[np.ix_(free_rowscols, free_rowscols)]

    print("Condensed matrix A:")
    print(repr(B.toarray()))

    print('Done')


if __name__ == "__main__":
    main()

产出如下：

Matrix A:
array([[6, 5, 4, 3, 2, 1, 0, 0, 0, 0],
       [5, 6, 5, 4, 3, 2, 1, 0, 0, 0],
       [4, 5, 6, 5, 4, 3, 2, 1, 0, 0],
       [3, 4, 5, 6, 5, 4, 3, 2, 1, 0],
       [2, 3, 4, 5, 6, 5, 4, 3, 2, 1],
       [1, 2, 3, 4, 5, 6, 5, 4, 3, 2],
       [0, 1, 2, 3, 4, 5, 6, 5, 4, 3],
       [0, 0, 1, 2, 3, 4, 5, 6, 5, 4],
       [0, 0, 0, 1, 2, 3, 4, 5, 6, 5],
       [0, 0, 0, 0, 1, 2, 3, 4, 5, 6]])
Condensed rows/columns of A:
array([2, 4, 6, 8])
Condensed matrix A:
array([[ 6,  5,  6,  3,  1,  0,  0],
       [ 5,  6,  9,  4,  2,  0,  0],
       [ 6,  9, 56, 14, 16, 14,  9],
       [ 3,  4, 14,  6,  4,  2,  0],
       [ 1,  2, 16,  4,  6,  4,  2],
       [ 0,  0, 14,  2,  4,  6,  4],
       [ 0,  0,  9,  0,  2,  4,  6]])
Done

编辑:根据hpaulj的评论，我们可以创建一个凝聚矩阵T

    # condensation with matrix multiplication
    n_conds = cond_rowscols.shape[0]  # number of condensed rows/cols
    t_vals = np.ones(n, dtype=int)
    t_rows = np.arange(n)
    t_cols = np.empty_like(t_rows)
    t_cols[free_rowscols] = np.arange(n-n_conds+1)
    t_cols[cond_rowscols[1:]] = cond_rowscols[0]
    T = csc_array((t_vals, (t_rows, t_cols)), shape=(n, n-n_conds+1))

因此凝聚矩阵A是B = T.T @ A @ T。

T是：

>>>print(repr(T.toarray()))
array([[1, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1]], dtype=int32)

这不会产生任何稀疏的效率错误！不过，对于更大的问题，我还是得给它计时。scipy.sparse是否对稀疏矩阵乘法使用稀疏BLAS？

python

scipy

sparse-matrix

回答 1

Stack Overflow用户

发布于 2022-02-22 21:31:37

我已经为MWE中显示的这两个选项创建了一个测试用例，使用了一个简单的性能比较，使用了比我在实践中使用的更小的一组值。

似乎就地和选项比乘法选项多5倍。

由于求解线性系统的代价总是高得多，我想我现在用稀疏矩阵乘法的方法来解决。

import numpy as np
from scipy.sparse import dia_array, csr_array
from time import perf_counter

def build_sp_mat(n=None, m=None):
    """Generates a square sparse CSC matrix of size `n` and half bandwidth `m`"""
    if n is None:
        n = 1_000  # matrix size
    if m is None:
        m = 50  # half bandwidth
    data = np.tile(np.concatenate((np.arange(1, m+1),
                                   np.arange(m-1, 0, -1)))[:, np.newaxis], (1, n))
    offsets = np.arange(-m+1, m)
    A = dia_array((data, offsets), shape=(n, n)).tocsc()
    return A

def condensed_rowscols(n, n_conds=None):
    """"Returns the indices of the condensed rows/columns
    Also returns a boolean mask of the non-condensed rows/columns
    """
    if n_conds is None:
        n_conds = 50  # total number of condensed rows/columns
    # condense these rows/columns
    cond_rowscols = np.arange(n//n_conds-1, n, n//n_conds)
    free_rowscols = np.ones(n, dtype=bool)
    free_rowscols[cond_rowscols[1:]] = False
    print(repr(cond_rowscols))
    return cond_rowscols, free_rowscols

def condense_in_place_sum(A, cond_rowscols, free_rowscols):
    """Performs condensation via a sum of the condensed rows and columns"""
    A[[cond_rowscols[0]]] += A[cond_rowscols[1:]].sum(axis=0)
    A[:, [cond_rowscols[0]]] += A[:, cond_rowscols[1:]].sum(axis=1)[:, np.newaxis]
    Acond = A[np.ix_(free_rowscols, free_rowscols)]  # indices are sorted
    return Acond

def condense_mat_mult(A, cond_rowscols, free_rowscols, n, n_conds):
    """Performs condensation via sparse matrix multiplications"""
    t_vals = np.ones(n, dtype=int)
    t_rows = np.arange(n)
    t_cols = np.empty_like(t_rows)
    t_cols[free_rowscols] = np.arange(n-n_conds+1)
    t_cols[cond_rowscols[1:]] = cond_rowscols[0]
    T = csr_array((t_vals, (t_rows, t_cols)), shape=(n, n-n_conds+1))
    Acond = T.T @ A @ T
    Acond.sort_indices()  # indices have to be sorted
    return Acond, T


if __name__ == "__main__":
    n, m, n_conds = 500_000, 54, 1000
    A = build_sp_mat(n, m)
    cond_rowscols, free_rowscols = condensed_rowscols(n, n_conds)

    A.sort_indices()  # this is important

    B = A.copy()
    stime = perf_counter()
    Acond1 = condense_in_place_sum(B, cond_rowscols, free_rowscols)
    print(f"Condensation via in-place sum took {perf_counter()-stime} s")
    # Acond1 comes with its indices array sorted automatically

    stime = perf_counter()
    Acond2, _ = condense_mat_mult(A, cond_rowscols, free_rowscols, n, n_conds)
    print(f"Condensation via sparse multiplication took {perf_counter()-stime} s")
    # Acond2's indices array has to be sorted, as shown in the function

    print("Done")

输出是

Condensation via in-place sum took 9.0079096 s
Condensation via sparse multiplication took 1.4657909 s
Done

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71223688

复制

相似问题

问如何有效地压缩稀疏矩阵
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何有效地压缩稀疏矩阵EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何有效地压缩稀疏矩阵
EN