Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/312.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何创建交互稀疏矩阵?_Python_Scipy_Sparse Matrix - Fatal编程技术网

Python 如何创建交互稀疏矩阵?

Python 如何创建交互稀疏矩阵?,python,scipy,sparse-matrix,Python,Scipy,Sparse Matrix,假设我有两个稀疏矩阵: from scipy.sparse import random from scipy import stats S0 = random(5000,100, density=0.01) S1 = random(5000,100,density=0.01) 我想创建一个稀疏矩阵S2,其中形状为(5000100*100)。(在我的实际应用中,“5000”应该是2000万)。对于每一行,它都是这两个100维向量中的某种交互作用 S2 = some_kind_of_tenso

假设我有两个稀疏矩阵:

from scipy.sparse import random
from scipy import stats

S0 = random(5000,100, density=0.01)
S1 = random(5000,100,density=0.01)
我想创建一个稀疏矩阵
S2
,其中形状为(5000100*100)。(在我的实际应用中,“5000”应该是2000万)。对于每一行,它都是这两个100维向量中的某种交互作用

S2 =  some_kind_of_tensor_multiplication(S0 ,S1 )
为了说明S2[i,j]=S0[i,k0]*S1[i,k1],我们迭代[0,99]中的所有k0,k1,以创建长度为10000的第i行。我找不到任何有效的方法来实现这一点。有人能帮忙吗

低效的方法看起来像,但我认为这将是非常低效的…:

result=[]
for i in range(S0.shape[1]):
    for j in range(S1.shape[1]):
        result.append(S0[:,i]*S1[:,j])
result = np.vstack(result).T
类似问题请访问:

我试过:

import numpy as np

from scipy.sparse import random
from scipy import stats
from scipy import sparse

S0 = random(20000000,100, density=0.01).tocsr()
S1 = random(20000000,100,density=0.01).tocsr()


def test_iter(A, B):
    m,n1 = A.shape
    n2 = B.shape[1]
    Cshape = (m, n1*n2)
    data = np.empty((m,),dtype=object)
    col =  np.empty((m,),dtype=object)
    row =  np.empty((m,),dtype=object)
    for i,(a,b) in enumerate(zip(A, B)):
        data[i] = np.outer(a.data, b.data).flatten()
        #col1 = a.indices * np.arange(1,a.nnz+1) # wrong when a isn't dense
        col1 = a.indices * n2   # correction
        col[i] = (col1[:,None]+b.indices).flatten()
        row[i] = np.full((a.nnz*b.nnz,), i)
    data = np.concatenate(data)
    col = np.concatenate(col)
    row = np.concatenate(row)
    return sparse.coo_matrix((data,(row,col)),shape=Cshape)
尝试:


墙时间:53分钟8秒。我们有更快的方案吗,谢谢?

这里有一个重写,直接使用
csr
intptr
。它通过直接切片
数据
索引
来节省时间,而不是每行创建一个全新的1行
csr
矩阵:

def test_iter2(A, B): 
    m,n1 = A.shape 
    n2 = B.shape[1] 
    Cshape = (m, n1*n2) 
    data = [] 
    col =  [] 
    row =  [] 
    for i in range(A.shape[0]): 
        slc1 = slice(A.indptr[i],A.indptr[i+1]) 
        data1 = A.data[slc1]; ind1 = A.indices[slc1] 
        slc2 = slice(B.indptr[i],B.indptr[i+1])  
        data2 = B.data[slc2]; ind2 = B.indices[slc2]  
        data.append(np.outer(data1, data2).ravel()) 
        col.append(((ind1*n2)[:,None]+ind2).ravel()) 
        row.append(np.full(len(data1)*len(data2), i)) 
    data = np.concatenate(data) 
    col = np.concatenate(col) 
    row = np.concatenate(row) 
    return sparse.coo_matrix((data,(row,col)),shape=Cshape) 
对于较小的测试用例,这将节省大量时间:

In [536]: S0=sparse.random(200,200, 0.01, format='csr')                                                   
In [537]: S1=sparse.random(200,200, 0.01, format='csr')                                                   
In [538]: timeit test_iter(S0,S1)                                                                         
42.8 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [539]: timeit test_iter2(S0,S1)                                                                        
6.94 ms ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

展示低效的方法,最好用一个小例子,让我们看看实际结果(而不是简单的描述)。@hpaulj,实际上你在5年前已经回答了类似的问题。。。但我不确定现在是否有任何新的解决方案。。。我为那个问题发起了悬赏。非常感谢!我的问题也快了6倍。我的赏金来自另一个问题。你认为这可能是最快的方法吗(考虑到我在做一些大数据,我们通常有100万行)?谢谢
In [536]: S0=sparse.random(200,200, 0.01, format='csr')                                                   
In [537]: S1=sparse.random(200,200, 0.01, format='csr')                                                   
In [538]: timeit test_iter(S0,S1)                                                                         
42.8 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [539]: timeit test_iter2(S0,S1)                                                                        
6.94 ms ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)