Python SciPy稀疏矩阵（COO，CSR）：清除行_Python_Scipy_Sparse Matrix

Python SciPy稀疏矩阵（COO，CSR）：清除行

python

Python SciPy稀疏矩阵（COO，CSR）：清除行,python,scipy,sparse-matrix,Python,Scipy,Sparse Matrix,为了创建，我有一个数组或行和列索引I和J以及一个数据数组V。我用这些来构造一个矩阵，然后把它转换成我有一组行索引，其中唯一的条目应该是对角线上的1.0。到目前为止，我通过I，找到所有需要擦除的索引，然后执行以下操作： def find(lst, a): # From <http://stackoverflow.com/a/16685428/353337> return [i for i, x in enumerate(lst) if x in a] # wipe_

为了创建，我有一个数组或行和列索引

和

以及一个数据数组

。我用这些来构造一个矩阵，然后把它转换成

我有一组行索引，其中唯一的条目应该是对角线上的

1.0

。到目前为止，我通过

，找到所有需要擦除的索引，然后执行以下操作：

def find(lst, a):
    # From <http://stackoverflow.com/a/16685428/353337>
    return [i for i, x in enumerate(lst) if x in a]

# wipe_rows = [1, 55, 32, ...]  # something something

indices = find(I, wipe_rows)  # takes too long
I = numpy.delete(I, indices).tolist()
J = numpy.delete(J, indices).tolist()
V = numpy.delete(V, indices).tolist()

# Add entry 1.0 to the diagonal for each wipe row
I.extend(wipe_rows)
J.extend(wipe_rows)
V.extend(numpy.ones(len(wipe_rows)))

# construct matrix via coo

def查找（lst，a）：
#从
返回[i表示i，x表示枚举（lst），如果x表示a]
#擦除行=[1,55,32，…]#某物
索引=查找（I，擦除行）#花费的时间太长
I=numpy.delete（I，索引）.tolist（）
J=numpy.delete（J，索引）.tolist（）
V=numpy.delete（V，索引）.tolist（）
#将条目1.0添加到每个刮水行的对角线
I.extend（擦除行）
J.延伸（擦拭行）
V.extend（numpy.one（len（wipe_行）））
#通过coo构造矩阵

这可以正常工作，但是

find

往往需要一段时间

有没有关于如何加快速度的提示？（也许以COO或CSR格式擦除行是一个更好的主意。）

如果要同时清除多行，请执行以下操作

def _wipe_rows_csr(matrix, rows):
    assert isinstance(matrix, sparse.csr_matrix)

    # delete rows
    for i in rows:
        matrix.data[matrix.indptr[i]:matrix.indptr[i+1]] = 0.0

    # Set the diagonal
    d = matrix.diagonal()
    d[rows] = 1.0
    matrix.setdiag(d)

    return

这是迄今为止最快的方法。它并没有真正删除线条，而是将所有条目设置为零，然后摆弄对角线

如果要实际删除条目，则必须执行一些数组操作。这可能是相当昂贵的，但如果速度不是问题：这

def _wipe_row_csr(A, i):
    '''Wipes a row of a matrix in CSR format and puts 1.0 on the diagonal.
    '''
    assert isinstance(A, sparse.csr_matrix)

    n = A.indptr[i+1] - A.indptr[i]

    assert n > 0

    A.data[A.indptr[i]+1:-n+1] = A.data[A.indptr[i+1]:]
    A.data[A.indptr[i]] = 1.0
    A.data = A.data[:-n+1]

    A.indices[A.indptr[i]+1:-n+1] = A.indices[A.indptr[i+1]:]
    A.indices[A.indptr[i]] = i
    A.indices = A.indices[:-n+1]

    A.indptr[i+1:] -= n-1

    return

将矩阵

矩阵的给定行i
替换为对角线上的条目1.0
。
np。1d
应该是查找索引的更快方法：
In [322]: I   # from a np.arange(12).reshape(4,3) matrix
Out[322]: array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int32)

In [323]: indices=[i for i, x in enumerate(I) if x in [1,2]]

In [324]: indices
Out[324]: [2, 3, 4, 5, 6, 7]

In [325]: ind1=np.in1d(I,[1,2])

In [326]: ind1
Out[326]: 
array([False, False,  True,  True,  True,  True,  True,  True, False,
       False, False], dtype=bool)

In [327]: np.where(ind1)   # same as indices
Out[327]: (array([2, 3, 4, 5, 6, 7], dtype=int32),)

In [328]: I[~ind1]  # same as the delete
Out[328]: array([0, 0, 3, 3, 3], dtype=int32)

像这样直接操作coo
输入通常是一种好方法。但另一个是利用csr的数学能力。您应该能够构造一个对角矩阵，将正确的行归零，然后再将正确的行相加
以下是我的想法：
In [357]: A=np.arange(16).reshape(4,4)
In [358]: M=sparse.coo_matrix(A)
In [359]: M.A
Out[359]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [360]: d1=sparse.diags([(1,0,0,1)],[0],(4,4))
In [361]: d2=sparse.diags([(0,1,1,0)],[0],(4,4))

In [362]: (d1*M+d2).A
Out[362]: 
array([[  0.,   1.,   2.,   3.],
       [  0.,   1.,   0.,   0.],
       [  0.,   0.,   1.,   0.],
       [ 12.,  13.,  14.,  15.]])

In [376]: x=np.ones((4,),bool);x[[1,2]]=False
In [378]: d1=sparse.diags([x],[0],(4,4),dtype=int)
In [379]: d2=sparse.diags([~x],[0],(4,4),dtype=int)

使用lil
格式执行此操作看起来很简单：
In [593]: Ml=M.tolil()
In [594]: Ml.data[wipe]=[[1]]*len(wipe)
In [595]: Ml.rows[wipe]=[[i] for i in wipe]

In [596]: Ml.A
Out[596]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  0,  0],
       [ 0,  0,  1,  0],
       [12, 13, 14, 15]], dtype=int32)

这有点像您使用csr格式所做的，但是很容易用合适的[1]和[i]列表替换每一行列表。但是转换时间（tolil
etc）会影响运行时间。
此测试相对较快，尤其是当矩阵已经是csr时。
In [593]: Ml=M.tolil()
In [594]: Ml.data[wipe]=[[1]]*len(wipe)
In [595]: Ml.rows[wipe]=[[i] for i in wipe]

In [596]: Ml.A
Out[596]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  0,  0],
       [ 0,  0,  1,  0],
       [12, 13, 14, 15]], dtype=int32)