Python 不分配密集阵列的快速稀疏矩阵乘法_Python_Performance_Numpy_Scipy_Sparse Matrix

Python 不分配密集阵列的快速稀疏矩阵乘法

python performance numpy

Python 不分配密集阵列的快速稀疏矩阵乘法,python,performance,numpy,scipy,sparse-matrix,Python,Performance,Numpy,Scipy,Sparse Matrix,我有一个m x m稀疏矩阵相似性和一个包含m个元素的向量，组合_比例。我希望将相似性中的第I列乘以组合比例[I]。这是我的第一次尝试： for i in range(m): scale = combined_scales[i] similarities[:, i] *= scale 这在语义上是正确的，但性能很差，因此我尝试将其更改为： # sparse.diags creates a diagonal matrix. # docs: https://docs.scipy.or

我有一个m x m稀疏矩阵

相似性

和一个包含m个元素的向量，

组合_比例

。我希望将

相似性中的第I列乘以组合比例[I]
。这是我的第一次尝试：
for i in range(m):
    scale = combined_scales[i]
    similarities[:, i] *= scale

这在语义上是正确的，但性能很差，因此我尝试将其更改为：
# sparse.diags creates a diagonal matrix.
# docs: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.diags.html
similarities *= sparse.diags(combined_scales)

但是当我运行这行代码时，我立即得到了一个内存错误。奇怪的是，scipy似乎正试图在这里分配一个密集的numpy数组：
Traceback (most recent call last):
  File "main.py", line 108, in <module>
    loop.run_until_complete(main())
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 466, in run_until_complete
    return future.result()
  File "main.py", line 100, in main
    magic.fit(df)
  File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 127, in fit
    self._scale_similarities(X, net_similarities)
  File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 148, in _scale_similarities
    similarities *= sparse.diags(combined_scales)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\base.py", line 440, in __mul__
    return self._mul_sparse_matrix(other)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\compressed.py", line 503, in _mul_sparse_matrix
    data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
MemoryError

回溯（最近一次呼叫最后一次）：
文件“main.py”，第108行，在
循环。运行\u直到完成（main（））
文件“C:\Users\james\AppData\Local\Programs\Python36-32\lib\asyncio\base\u events.py”，第466行，运行\u直到完成
返回future.result（）
文件“main.py”，第100行，在main中
magic.fit（df）
文件“C:\cygwin64\home\james\code\py\relativity\ml.py”，第127行，适合
自相似性（X，净相似性）
文件“C:\cygwin64\home\james\code\py\relativity\ml.py”，第148行，按比例
相似性*=稀疏图（组合_标度）
文件“C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site packages\scipy\sparse\base.py”，第440行，在__
返回self.\u mul\u稀疏矩阵（其他）
文件“C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site packages\scipy\sparse\compressed.py”，第503行，在矩阵中
data=np.empty（nnz，dtype=upcast（self.dtype，other.dtype））
记忆者

如何防止它在此处分配密集阵列？谢谢。
来自sparse.compressed

class _cs_matrix    # common for csr and csc
    def _mul_sparse_matrix(self, other):
        M, K1 = self.shape
        K2, N = other.shape

        major_axis = self._swap((M,N))[0]
        other = self.__class__(other)  # convert to this format

        idx_dtype = get_index_dtype((self.indptr, self.indices,
                                     other.indptr, other.indices),
                                    maxval=M*N)
        indptr = np.empty(major_axis + 1, dtype=idx_dtype)

        fn = getattr(_sparsetools, self.format + '_matmat_pass1')
        fn(M, N,
           np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           indptr)

        nnz = indptr[-1]
        idx_dtype = get_index_dtype((self.indptr, self.indices,
                                     other.indptr, other.indices),
                                    maxval=nnz)
        indptr = np.asarray(indptr, dtype=idx_dtype)
        indices = np.empty(nnz, dtype=idx_dtype)
        data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

        fn = getattr(_sparsetools, self.format + '_matmat_pass2')
        fn(M, N, np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           self.data,
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           other.data,
           indptr, indices, data)

        return self.__class__((data,indices,indptr),shape=(M,N))

相似性
是一个稀疏的csr矩阵other
，diag
矩阵也已在中转换为csr
other = self.__class__(other) 

csr\u matmat\u pass1
（编译代码）使用来自self
和other
的索引运行，返回nnz
，即输出中非零项的数量
然后分配indptr
、索引
和数据
数组，这些数组将保存csr\u matmatmat\u pass2
的结果。这些用于创建回报矩阵
self.__class__((data,indices,indptr),shape=(M,N))

创建数据数组时出错：
data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

返回的结果只是内存中有太多的非零值
什么是m
，以及相似性.nnz

是否有足够的内存来执行类似操作。copy（）

当您使用相似性*=…
时，它首先必须执行相似性*其他
。然后，结果将替换self
。它不尝试进行就地乘法
列内迭代
关于按行（或列）进行更快的迭代，寻求排序或获得最大的行值，已经有很多问题。直接使用csr
属性可以大大加快速度。我认为这个想法适用于这里
例如：
In [275]: A = sparse.random(10,10,.2,'csc').astype(int)
In [276]: A.data[:] = np.arange(1,21)
In [277]: A.A
Out[277]: 
array([[ 0,  0,  4,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0, 10,  0,  0, 16, 18],
       [ 0,  0,  0,  0,  0, 11, 14,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  8,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  9, 12,  0,  0, 17,  0],
       [ 2,  0,  0,  0,  0, 13,  0,  0,  0,  0],
       [ 0,  0,  5,  7,  0,  0,  0, 15,  0, 19],
       [ 0,  0,  6,  0,  0,  0,  0,  0,  0, 20]])
In [280]: B = sparse.diags(np.arange(1,11),dtype=int)
In [281]: B
Out[281]: 
<10x10 sparse matrix of type '<class 'numpy.int64'>'
    with 10 stored elements (1 diagonals) in DIAgonal format>
In [282]: (A*B).A
Out[282]: 
array([[  0,   0,  12,   0,   0,   0,   0,   0,   0,   0],
       [  0,   6,   0,   0,   0,   0,   0,   0,   0,   0],
       [  1,   0,   0,   0,   0,  60,   0,   0, 144, 180],
       [  0,   0,   0,   0,   0,  66,  98,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  40,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  45,  72,   0,   0, 153,   0],
       [  2,   0,   0,   0,   0,  78,   0,   0,   0,   0],
       [  0,   0,  15,  28,   0,   0,   0, 120,   0, 190],
       [  0,   0,  18,   0,   0,   0,   0,   0,   0, 200]], dtype=int64)

时间比较：
In [287]: %%timeit A1=A.copy()
     ...: A1 *= B
     ...: 
375 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [288]: %%timeit A1 = A.copy()
     ...: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
     ...:     A1.data[i:j] *= v
     ...:     
79.9 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

您确定要分配密集阵列吗？运行相似性。计算\u nonzero（）
并告诉我们它返回什么。实际上，回溯的最后一行表明它试图分配一个稀疏的结果，nnz代表“非零的数量”。相似性是什么格式？
In [287]: %%timeit A1=A.copy()
     ...: A1 *= B
     ...: 
375 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [288]: %%timeit A1 = A.copy()
     ...: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
     ...:     A1.data[i:j] *= v
     ...:     
79.9 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)