Python 对称稀疏矩阵的高效切片_Python_Cython_Slice_Sparse Matrix

Python 对称稀疏矩阵的高效切片

python

Python 对称稀疏矩阵的高效切片,python,cython,slice,sparse-matrix,Python,Cython,Slice,Sparse Matrix,我有一个稀疏对称矩阵列表sigma，这样 len(sigma) = N 对于所有i，j，k sigma[i].shape[0] == sigma[i].shape[1] = m # Square sigma[i][j,k] == sigma[i][k,j] # Symmetric 我有一个索引数组p，这样 P.shape[0] = N P.shape[1] = k 我的目标是使用p[i，：]给出的索引提取k x k的sigma[i]稠密子矩阵。这可以按如下方式进行 sub_matric

我有一个稀疏对称矩阵列表

sigma

，这样

len(sigma) = N

对于所有

i，j，k

sigma[i].shape[0] == sigma[i].shape[1] = m  # Square
sigma[i][j,k] == sigma[i][k,j]  # Symmetric

我有一个索引数组

，这样

P.shape[0] = N
P.shape[1] = k

我的目标是使用

p[i，：]

给出的索引提取

k x k

的

sigma[i]

稠密子矩阵。这可以按如下方式进行

sub_matrices = np.empty([N,k,k])
for i in range(N):
    sub_matrices[i,:,:] = sigma[i][np.ix_(P[i,:], P[i,:])].todense()

然而，请注意，

很小，

（和

）很大。如果稀疏对称矩阵以CSR格式存储，则需要很长时间。我觉得一定有更好的解决办法。例如，是否有一种稀疏格式适合于需要在两个维度上切片的对称矩阵

我正在使用Python，但对于任何我可以使用Cython接口的C库建议，我都持开放态度

额外的

请注意，我当前的Cython方法如下：

cimport cython
import numpy as np
cimport numpy as np

@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
                           long[:,:] P,
                           double[:,:,:] sub_matrices):
    """
    Inputs:
        sigma: A list (N,) of sparse sp.csr_matrix (m x m)
        P: A 2D array of integers (N, k)
        sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
    """
    # Create variables for keeping code tidy
    cdef long N = P.shape[0]
    cdef long k = P.shape[1]

    cdef long i
    cdef long j
    cdef long index_pointer 
    cdef long sparse_row_pointer

    # Create objects for holding sparse matrix data
    cdef double[:] data
    cdef long[:] indices
    cdef long[:] indptr

    # Object for the ordered P
    cdef long[:] perm

    # Make sure sub_matrices is all 0
    sub_matrices[:] = 0

    for i in range(N):
        # Sort the P
        perm = np.argsort(P[i,:])

        # Get the sparse matrix values
        data     = sigma[i].data
        indices  = sigma[i].indices.astype(long)
        indptr   = sigma[i].indptr.astype(long)

        for j in range(k):
            # Loop over row P[i, perm[j]] in sigma searching for values
            # in P[i, :] vector i.e. compare
            #     sigma[P[i, perm[j], :]
            # against
            #     P[i,:]

            # To do this we need our sparse row vector with columns 
            #     indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # and data/values
            #     data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # which comes from the csr matrix format.
            # We also need our sorted indexing vector
            #     P[i, perm[:]]

            # We begin by pointing at the top of both
            # our vectors and gradually move down them. In the event of 
            # an equality we add the data to sub_matrices[i,:,:] and 
            # increment the INDEXING VECTOR pointer, not the sparse
            # row vector pointer, as there can be multiple values that 
            # are the same in the indexing vector but not the sparse row
            # column vector (only 1 column can appear in 1 row!).
            index_pointer = 0
            sparse_row_pointer = indptr[P[i, perm[j]]]

            while ((index_pointer < k) and (sparse_row_pointer < indptr[P[i, perm[j]] + 1])):
                if indices[sparse_row_pointer] == P[i, perm[index_pointer]]:
                    # We can add data to sub_matrices
                    sub_matrices[i, perm[j], perm[index_pointer]] = \
                           data[sparse_row_pointer]

                    # Only increment the index pointer
                    index_pointer += 1
                elif indices[sparse_row_pointer] > P[i, perm[index_pointer]]:
                    # Need to increment index pointer
                    index_pointer += 1
                else:
                    # Need to increment sparse row pointer
                    sparse_row_pointer += 1

并行版本

下面是一个并行版本，尽管它似乎没有提供任何加速，代码也不再那么好看：

# See https://stackoverflow.com/questions/48805636/efficient-slicing-of-symmetric-sparse-matrices
cimport cython
import numpy as np
cimport numpy as np
from libc.stdlib cimport malloc, free
from cython.parallel import prange

@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
                           np.ndarray[np.int32_t, ndim=2] P,
                           np.float64_t[:,:,:] sub_matrices,
                           int symmetric):
    """
    Inputs:
        sigma: A list (N,) of sparse sp.csr_matrix (m x m)
        P: A 2D array of integers (N, k)
        sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
        symmetric: 1 if the sigma matrices are symmetric
    """
    # Create variables for keeping code tidy
    cdef np.int32_t N = P.shape[0]
    cdef np.int32_t k = P.shape[1]

    cdef np.int32_t i
    cdef np.int32_t j
    cdef np.int32_t index_pointer 
    cdef np.int32_t sparse_row_pointer

    # Create objects for holding sparse matrix data
    cdef np.float64_t[:] data_mem_view
    cdef np.int32_t[:] indices_mem_view
    cdef np.int32_t[:] indptr_mem_view

    cdef np.float64_t **data = <np.float64_t **> malloc(N * sizeof(np.float64_t *))
    cdef np.int32_t **indices = <np.int32_t **> malloc(N * sizeof(np.int32_t *))
    cdef np.int32_t **indptr = <np.int32_t **> malloc(N * sizeof(np.int32_t *))

    for i in range(N):
        data_mem_view = sigma[i].data
        data[i] = &(data_mem_view[0])

        indices_mem_view = sigma[i].indices
        indices[i] = &(indices_mem_view[0])

        indptr_mem_view = sigma[i].indptr
        indptr[i] = &(indptr_mem_view[0])

    # Object for the ordered P
    cdef np.int32_t[:,:] perm = np.argsort(P, axis=1).astype(np.int32)

    # Make sure sub_matrices is all 0
    sub_matrices[:] = 0

    for i in prange(N, nogil=True):
        for j in range(k):
            # Loop over row P[i, perm[j]] in sigma searching for values
            # in P[i, :] vector i.e. compare
            #     sigma[P[i, perm[j], :]
            # against
            #     P[i,:]
            # To do this we need our sparse row vector with columns 
            #     indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # and data/values
            #     data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # which comes from the csr matrix format.
            # We also need our sorted indexing vector
            #     P[i, perm[:]]

            # We begin by pointing at the top of both
            # our vectors and gradually move down them. In the event of 
            # an equality we add the data to sub_matrices[i,:,:] and 
            # increment the INDEXING VECTOR pointer, not the sparse
            # row vector pointer, as there can be multiple values that 
            # are the same in the indexing vector but not the sparse row
            # column vector (only 1 column can appear in 1 row!).

            if symmetric:
                index_pointer = j  # Only search upper triangular
            else:
                index_pointer = 0
            sparse_row_pointer = indptr[i][P[i, perm[i, j]]]

            while ((index_pointer < k) and 
                   (sparse_row_pointer < indptr[i][P[i, perm[i, j]] + 1])):
                if indices[i][sparse_row_pointer] == P[i, perm[i, index_pointer]]:
                    # We can add data to sub_matrices
                    sub_matrices[i, perm[i, j], perm[i, index_pointer]] = \
                           data[i][sparse_row_pointer]

                    if symmetric:
                        sub_matrices[i, perm[i, index_pointer], perm[i, j]] = \
                               data[i][sparse_row_pointer]

                    # Only increment the index pointer
                    index_pointer = index_pointer + 1
                elif indices[i][sparse_row_pointer] > P[i, perm[i, index_pointer]]:
                    # Need to increment index pointer
                    index_pointer = index_pointer + 1
                else:
                    # Need to increment sparse row pointer
                    sparse_row_pointer = sparse_row_pointer + 1

    # Free malloc'd data
    free(data)
    free(indices)
    free(indptr)

其中

sparse_slice.pyx

是文件名。然后可以使用以下脚本：

import time
import numpy as np
import scipy as sp
import scipy.sparse
from sparse_slice import sparse_slice_fast_cy

k = 100
N = 20000
m = 10000
samples = 20

# Create sigma matrices
## The sampling of random sparse takes a while so just do a few and 
## then populate with these.
now = time.time()
sigma_samples = []
for i in range(samples):
    sigma_samples.append(sp.sparse.rand(m, m, density=0.001, format='csr'))
    sigma_samples[-1] = sigma_samples[-1] + sigma_samples[-1].T  # Symmetric

## Now make the sigma list from these.
sigma = []
for i in range(N):
    j = np.random.randint(samples)
    sigma.append(sigma_samples[j])
print('Time to make sigma: {}'.format(time.time() - now))

# Create indexer
now = time.time()
P = np.empty([N, k]).astype(int)
for i in range(N):
    P[i, :] = np.random.choice(np.arange(m), k, replace=True)
print('Time to make P: {}'.format(time.time() - now))

# Create objects for holding the slices
sub_matrices_slow = np.empty([N, k, k])
sub_matrices_fast = np.empty([N, k, k])

# Run both slicings
## Slow
now = time.time()
for i in range(N):
    sub_matrices_slow[i,:,:] = sigma[i][np.ix_(P[i,:], P[i,:])].todense()
print('Time to make sub_matrices_slow: {}'.format(time.time() - now))

## Fast
symmetric = 1
now = time.time()
sparse_slice_fast_cy(sigma, P.astype(np.int32), sub_matrices_fast, symmetric)
print('Time to make sub_matrices_fast: {}'.format(time.time() - now))

assert(np.all((sub_matrices_slow - sub_matrices_fast)**2 < 1e-6))

导入时间
将numpy作为np导入
将scipy作为sp导入
导入scipy.sparse
从稀疏切片导入稀疏切片快速
k=100
N=20000
m=10000
样本=20
#创建西格玛矩阵
##随机稀疏的采样需要一段时间，所以只需执行一些
##然后填充这些。
now=time.time（）
西格玛_样本=[]
对于范围内的i（样品）：
sigma_samples.append（sp.sparse.rand（m，m，density=0.001，format='csr'））
sigma_样本[-1]=sigma_样本[-1]+sigma_样本[-1]。T#对称
##现在，从这些列表中创建sigma列表。
西格玛=[]
对于范围（N）中的i：
j=np.random.randint（样本）
sigma.append（sigma_样本[j]）
打印（'Time to make sigma:{}'。格式（Time.Time（）-now））
#创建索引器
now=time.time（）
P=np.empty（[N，k]）.astype（int）
对于范围（N）中的i：
P[i，：]=np.random.choice（np.arange（m），k，replace=True）
打印（'Time to make P:{}'。格式（Time.Time（）-now））
#创建用于保存切片的对象
子矩阵\u slow=np.empty（[N，k，k]）
子矩阵\u fast=np.empty（[N，k，k]）
#运行两个滑轨
##慢
now=time.time（）
对于范围（N）中的i：
子矩阵_slow[i，：，：]=sigma[i][np.ix（P[i，：]，P[i，：]）
打印（'Time to make sub_matrix_slow:{}'。格式（Time.Time（）-now））
##快速
对称=1
now=time.time（）
稀疏切片快速（sigma，P.astype（np.int32），子矩阵快速，对称）
print（'Time to make sub_matrix_fast:{}'。格式（Time.Time（）-now））
断言（np.all（（子矩阵慢-子矩阵快）**2<1e-6））

目前无法测试，但有两条建议：

A）在

-循环的一侧对所有行进行一次排序：

# Object for the ordered P
cdef long[:,:] perm = np.argsort(P, axis=1)

可能您需要将p作为

np.ndarray[np.int64_t，ndim=2]p

（或任何类型）传递，以避免复制。您必须通过

perm[i，X]

而不是

perm[X]

访问数据

B）定义

因此，您不需要通过“.astype”复制数据，即

for i in range(N):
    data     = sigma[i].data
    indices  = sigma[i].indices
    indptr   = sigma[i].indptr

我认为，因为

sigma[I]

包含

O（m）

元素，所以复制是功能的瓶颈：你得到的是运行时间

O（N*（m+k^2））

而不是'O（N*k^2'）——最好避免它

否则，该函数看起来不会太糟糕

为了让

prange

使用

-循环，您应该通过创建指向

数据

的第一个元素、

索引

和

indptr

的指针数组，并在廉价的预处理步骤中填充它们，从而将对

sigma[i]

的访问移到循环之外。我们可以让它工作，但问题是并行化的好处有多大——很可能是这样，问题是内存受限的——我们必须看到时间

也可以通过仅处理上部三角形矩阵来使用对称性：

  ...
  index_pointer = j #only upper triangle!
  ....
  ....
     # We can add data to sub_matrices
     #upper triangle sub-matrix:
     sub_matrices[i, perm[j], perm[index_pointer]] = \
                       data[sparse_row_pointer]
     #lower triangle sub-matrix:
     sub_matrices[i, perm[index_pointer], perm[j]] = \
                       data[sparse_row_pointer]
  ....

我会从B）开始，看看结果如何

编辑：

关于内存使用：可以通过

 /usr/bin/time -f "peak_used_memory:%M(in Kb)" python test.py

我使用

N=2000

运行测试并获取（python3.6+cython0.27.1）：

因此有50Mb的开销，两个函数都使用了200Mb，另外还有176MB用于评估断言。对于

的其他值，我也可以看到相同的行为

所以我想说cython并没有占用大量内存

此任务很可能（至少部分）内存受限，因此并行化不会有多大帮助。您应该减少加载到缓存的内存量

一种可能性是不使用

perm

——毕竟它还需要加载到缓存中。如果你愿意，你可以做

您可以使用矩阵sigma中的任何行/列排列，而不仅仅是排序

并使用它

每行只有很少的元素，所以对每个元素进行线性搜索就可以了

对每个元素进行二进制搜索

我想在最好的情况下你可以赢20-30%

有时cython生成的代码不容易为c编译器进行优化，直接用c编写，然后用python包装，通常会获得更好的结果

但我会做所有这些，只要这个操作真的，真的是你计划的瓶颈

顺便说一下，宣布

cdef np.int64_t[:,:] perm = np.argsort(P, axis=1)

您不需要额外的复制。

您的时间安排（对于给定的N，k）是什么？您希望实现什么样的加速？我认为企业社会责任对于这类问题来说是一个不错的选择。（N，k）约为（3000000，100）。执行大约需要2分钟。Scipy sparse对对称矩阵没有任何特殊功能

csr

矩阵索引实际上是通过矩阵乘法执行的。你知道这些矩阵是如何存储的吗？将

sigma[i]

转换为稠密，然后进行索引可能会更快

  ...
  index_pointer = j #only upper triangle!
  ....
  ....
     # We can add data to sub_matrices
     #upper triangle sub-matrix:
     sub_matrices[i, perm[j], perm[index_pointer]] = \
                       data[sparse_row_pointer]
     #lower triangle sub-matrix:
     sub_matrices[i, perm[index_pointer], perm[j]] = \
                       data[sparse_row_pointer]
  ....

 /usr/bin/time -f "peak_used_memory:%M(in Kb)" python test.py

                             peak memory usage
only slow                       245Mb
only fast                       245Mb
slow+fast no check              402Mb
slow+fast+assert                576Mb

cdef np.int64_t[:,:] perm = np.argsort(P, axis=1)