Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/asp.net-core/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scipy error-VALUERROR:行索引超出矩阵维度_Python_Scipy_Sparse Matrix - Fatal编程技术网

Python Scipy error-VALUERROR:行索引超出矩阵维度

Python Scipy error-VALUERROR:行索引超出矩阵维度,python,scipy,sparse-matrix,Python,Scipy,Sparse Matrix,我使用下面的代码来构建训练和测试矩阵,以便在我的NN模型中使用它们 from scipy.sparse import csr_matrix import pandas as pd from sklearn.model_selection import train_test_split df = pd.read_csv('data.csv', names=['x', 'y', 'z']) x = df.x.unique().shape[0] y = df.y.unique().shape[0

我使用下面的代码来构建训练和测试矩阵,以便在我的NN模型中使用它们

from scipy.sparse import csr_matrix
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv('data.csv', names=['x', 'y', 'z'])


x = df.x.unique().shape[0]
y = df.y.unique().shape[0]

train_data, test_data = train_test_split(df, test_size=0.2)
train_data = pd.DataFrame(train_data)
test_data = pd.DataFrame(test_data)

#Build train matrix
train_x = []
train_y = []
train_z = []

for line in train_data.itertuples():
    u = line[1] - 1
    i = line[2] - 1
    train_x.append(u)
    train_y.append(i)
    train_z.append(line[3])
train_matrix = csr_matrix((train_z, (train_x, train_y)), shape=(x, y))

#Build test matrix
test_x = []
test_y = []
test_z = []
for line in test_data.itertuples():
    test_x.append(line[1] - 1)
    test_y.append(line[2] - 1)
    test_z.append(line[3])
test_matrix = csr_matrix((test_z, (test_x, test_y)), shape=(x, y))

当我处理小数据集时,它工作得非常好。然而,当我使用它来处理稍大一点的数据集(600MB)时,它就不起作用了。它向我显示了这个错误:

 File "C:\Users\Mus\Anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 51, in __init__
    other = self.__class__(coo_matrix(arg1, shape=shape))
  File "C:\Users\Mus\Anaconda3\lib\site-packages\scipy\sparse\coo.py", line 192, in __init__
    self._check()
  File "C:\Users\Mus\Anaconda3\lib\site-packages\scipy\sparse\coo.py", line 272, in _check
    raise ValueError('row index exceeds matrix dimensions')
ValueError: row index exceeds matrix dimensions

当我尝试下面的代码时,它显示了同一行中的另一个错误:

train_data, test_data = train_test_split(csr_matrix(df[z].values, (df[x].values, df[y].values)), test_size=0.2)

非常感谢您的帮助。

由@CJR提出的此代码取代了构建列车和测试矩阵的所有代码


train\u matrix,test\u matrix=train\u test\u split(csr\u matrix((df['z'].值,(df['x']值,df['y'].值))),test\u size=0.2)

代码中的错误发生在哪里。您没有显示完整的回溯,因此我只能猜测这是
csr\u矩阵中的一行。我猜由于某种原因,
test\ux
中的值大于
x
。我不知道为什么这只会发生在更大的数据集上。@hapaulj,正如你所说,它出现在csr_矩阵的两行中。数据集中是否存在问题?但是我确信如果
x
中的值是
1,2,3
这将起作用,那么就没有问题了。如果
x
中的值是
1、2、4
,它将失败,并出现您遇到的错误。我不知道你为什么选择这样做而不是
train\u data,test\u data=train\u test\u split(csr\u矩阵(df[z].values,(df[x].values,df[y].values)),test\u size=0.2)
@CJR,我试图用你的代码来做,但我不理解显示的错误。我在字符串
'x'
是数据框中的列名的上方添加了此错误消息。存储在变量
x
中的数字
5
不是数据框中的列名。应该使用字符串而不是变量调用数据帧中的列。
File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache
    values = self._data.get(item)
  File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 5