Python “隐秘的scipy”；无法转换整型标量"；错误_Python_Numpy_Scipy_Sparse Matrix

Python “隐秘的scipy”；无法转换整型标量"；错误

python numpy

Python “隐秘的scipy”；无法转换整型标量"；错误,python,numpy,scipy,sparse-matrix,Python,Numpy,Scipy,Sparse Matrix,我正在使用scipy.sparse.csr\u矩阵构建稀疏向量，如下所示： csr_matrix((values, (np.zeros(len(indices)), indices)), shape = (1, max_index)) 这对我的大多数数据都很有效，但偶尔会出现ValueError:无法转换整型标量这再现了问题： In [145]: inds Out[145]: array([ 827969148, 996833913, 1968345558, 898183169, 18

我正在使用

scipy.sparse.csr\u矩阵构建稀疏向量，如下所示：
csr_matrix((values, (np.zeros(len(indices)), indices)), shape = (1, max_index))

这对我的大多数数据都很有效，但偶尔会出现ValueError:无法转换整型标量

这再现了问题：
In [145]: inds

Out[145]:
array([ 827969148,  996833913, 1968345558,  898183169, 1811744124,
        2101454109,  133039182,  898183170,  919293479,  133039089])

In [146]: vals

Out[146]:
array([ 1.,  1.,  1.,  1.,  1.,  2.,  1.,  1.,  1.,  1.])

In [147]: max_index

Out[147]:
2337713000

In [143]: csr_matrix((vals, (np.zeros(10), inds)), shape = (1, max_index+1))
...

    996         fn = _sparsetools.csr_sum_duplicates
    997         M,N = self._swap(self.shape)
--> 998         fn(M, N, self.indptr, self.indices, self.data)
    999 
    1000         self.prune()  # nnz may have changed

ValueError: could not convert integer scalar

inds
是一个np.int64
数组，vals
是一个np.float64
数组
scipysum_duplicates
代码的相关部分为
请注意，这是有效的：
In [235]: csr_matrix(([1,1], ([0,0], [1,2])), shape = (1, 2**34))
Out[235]:

<1x17179869184 sparse matrix of type '<type 'numpy.int64'>'
    with 2 stored elements in Compressed Sparse Row format>

[235]中的csr_矩阵（[1,1]，[0,0]，[1,2]），形状=（1,2**34））
出[235]：

所以问题不在于其中一个维度是>2^31

有没有想过为什么这些值会引起问题？可能是max\u index>2**31？
试试这个，只是为了确保：
csr_矩阵（（vals，（np.zero（10），inds/2）），shape=（1，max_index/2））
您提供的最大索引小于您提供的行的最大索引
这个
sparse.csr_矩阵（（vals，（np.zero（10），inds）），shape=（1，np.max（inds）+1））
对我来说很好
尽管生成.todense（）会导致矩阵较大的内存错误
取消对sum_duplicates的注释-函数将导致其他错误。但这个解决方案：也解决了你的问题。您可以将版本检查扩展到较新版本的scipy
import scipy 
import scipy.sparse  
if scipy.__version__ in ("0.14.0", "0.14.1", "0.15.1"): 
    _get_index_dtype = scipy.sparse.sputils.get_index_dtype 
    def _my_get_index_dtype(*a, **kw): 
        kw.pop('check_contents', None) 
        return _get_index_dtype(*a, **kw) 
    scipy.sparse.compressed.get_index_dtype = _my_get_index_dtype 
    scipy.sparse.csr.get_index_dtype = _my_get_index_dtype 
    scipy.sparse.bsr.get_index_dtype = _my_get_index_dtype 

是的，这也是我的第一个想法——但它使用相同的max_index
no，scipy.sparse处理其他类似数据。csr_矩阵
可以很好地处理max_index>2**31
——请参见编辑的问题。@Rok我实际上得到了一个不同的异常（使用Python 2.7+scipy 0.9.0）。我可以用2**31-1
构造矩阵，但不能用2**31
构造矩阵。您使用的是哪个scipy版本？@matiasg:scipy 0.15.1使用continuum anacondaI安装的Anaconda安装。他们现在使用64位作为索引，因为我可以用2**63-1
构建矩阵，但不能用2**63
构建矩阵。这与您的问题无关，但似乎有点烦人。嗯，不——索引数组中的最大值是2101454109，但max_index
是2337713001。当维度太小时，它抛出一个ValueError：列索引超过矩阵维度错误。尽管使用inds.max（）+1
是正确的。情节越来越复杂。哎呀，我数了一个零。顺便说一句，对我来说，任何大于2**32-1的都不起作用（你的例子失败了）。它抛出了一个奇怪的异常，NotImplementedError：重载函数“coo_tocsr”的参数数量或类型错误。我使用的是Enthound student distribution，scipy版本：“0.13.3”我猜你的版本使用的是32位整数？请尝试使用2**31
和2**31-1
与你发布的示例相同。是的，我发布的示例数据适用于2**31-1
，但不适用于2**31