Python 行索引超出scipy csr_矩阵的矩阵维度_Python_Pandas_Scipy

Python 行索引超出scipy csr_矩阵的矩阵维度

python pandas

Python 行索引超出scipy csr_矩阵的矩阵维度,python,pandas,scipy,Python,Pandas,Scipy,我是python和pandas的新手，我有以下问题我有一个数据集 df = pd.read_csv('/home/nikoscha/Documents/ThesisR/dataset.csv', names=['response_nn','event','user']) 我正试图用以下代码创建一个csr_矩阵 # Create lists of all events, users adfnd respones events = list(np.sort(df.event_id.unique(

我是python和pandas的新手，我有以下问题

我有一个数据集

df = pd.read_csv('/home/nikoscha/Documents/ThesisR/dataset.csv', names=['response_nn','event','user'])

我正试图用以下代码创建一个csr_矩阵

# Create lists of all events, users adfnd respones
events = list(np.sort(df.event_id.unique()))
users = list(np.sort(df.user_id.unique()))
responses = list(df.responses)

# Get the rows and columns for our new matrix
rows = df.user_id.astype(float)
cols = df.event_id.astype(float)

# Contruct a sparse matrix for our users and items containing number of plays
data_sparse = sp.csr_matrix((responses, (rows, cols)), shape=(len(users), len(events)))

上述代码有效。但是当我得到一个训练数据集

mask = np.random.rand(len(df)) < 0.5
df = df[mask]
df = df.reset_index() 
df = df.drop(['index'], axis=1)

然后尝试构造稀疏矩阵，我得到以下错误

ValueError:行索引超出了矩阵维度

谁能解释一下原因吗？如scipy文件中所述，当csr\u矩阵初始化时，提前感谢您：

csr_矩阵（（数据，（行索引，列索引）），[shape=（M，N）]）

在scipy.sparse.csr.py中：

csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])  
        where `data`, `row_ind` and `col_ind` satisfy the  
        relationship `a[row_ind[k], col_ind[k]] = data[k]`.

当csr初始化时，它将检查row_ind.max（）和M之间的关系

同样在scipy.sparse.coo.py中：

if self.row.max() >= self.shape[0]:
                raise ValueError('row index exceeds matrix dimensions')
            if self.col.max() >= self.shape[1]:
                raise ValueError('column index exceeds matrix dimensions')
            if self.row.min() < 0:
                raise ValueError('negative row index found')
            if self.col.min() < 0:
                raise ValueError('negative column index found')

当第[0]=9行带注释时，它可以正常工作。希望有帮助。

您指定

len（用户）

作为矩阵行维度。但显然，

行

包含大于该值的值。（并不是因为它导致此错误，而是

行

和

列

应该是

astype（integer）

，而不是float。）

if self.row.max() >= self.shape[0]:
                raise ValueError('row index exceeds matrix dimensions')
            if self.col.max() >= self.shape[1]:
                raise ValueError('column index exceeds matrix dimensions')
            if self.row.min() < 0:
                raise ValueError('negative row index found')
            if self.col.min() < 0:
                raise ValueError('negative column index found')

a = np.random.random((8,2))
row = np.hstack((a[:,0],a[:,1]))
#row[0]=9
col = np.hstack([a[:,1],a[:,0]])
matrix = csr_matrix(([1]*row.shape[0], (row,col)),shape=(a.shape[0],a.shape[0]))