Python 索引器错误:索引超出大小为的轴0的界限

Python 索引器错误:索引超出大小为的轴0的界限,python,arrays,list,numpy,machine-learning,Python,Arrays,List,Numpy,Machine Learning,我有数组x\u列和targets\u列。我想洗牌训练数据,并将其分成更小的批次,并将这些批次用作训练数据。我的原始数据有1000行,每次尝试使用250行时: x_train = np.memmap('/home/usr/train', dtype='float32', mode='r', shape=(1000, 1, 784)) # print(x_train) targets_train = np.memmap('/home/usr/train_label', dtype='int3

我有数组
x\u列
targets\u列
。我想洗牌训练数据,并将其分成更小的批次,并将这些批次用作训练数据。我的原始数据有1000行,每次尝试使用250行时:

    x_train = np.memmap('/home/usr/train', dtype='float32', mode='r', shape=(1000, 1, 784))
# print(x_train)
targets_train = np.memmap('/home/usr/train_label', dtype='int32', mode='r', shape=(1000, 1))
train_idxs = [i for i in range(x_train.shape[0])]
np.random.shuffle(train_idxs)


num_batches_train = 4
def next_batch(start, train, labels, batch_size=250):
    newstart = start + batch_size
    if newstart > train.shape[0]:
        newstart = 0
    idxs = train_idxs[start:start + batch_size]
    # print(idxs)
    return train[idxs, :], labels[idxs, :], newstart


# x_train_lab = x_train[:200]
# # x_train = np.array(targets_train)
# targets_train_lab = targets_train[:200]
for i in range(num_batches_train):
    x_train, targets_train, newstart = next_batch(i*batch_size, x_train, targets_train, batch_size=250)
问题是,当我洗牌训练数据并尝试访问批次时,我得到一个错误提示:

    return train[idxs, :], labels[idxs, :], newstart
    IndexError: index 250 is out of bounds for axis 0 with size 250

有人知道我做错了什么吗?

函数定义中的这一行有问题:

idxs = train_idxs[start:start + batch_size]
将其更改为:

idxs = train_idxs[start: newstart]
那么它应该像预期的那样工作

另外,请将
for
循环中的变量名称更改为类似以下内容:

batch_size = 250
for i in range(num_batches_train):
    x_train_split, targets_train_split, newstart = next_batch(i*batch_size, 
                                                              x_train,
                                                              targets_train,
                                                              batch_size=250)
    print(x_train_split.shape, targets_train_split.shape, newstart)
样本输出:

(250, 1, 784) (250, 1) 250
(250, 1, 784) (250, 1) 500
(250, 1, 784) (250, 1) 750
(250, 1, 784) (250, 1) 1000
(编辑-首次猜测
newstart
已删除)

在这方面:

x_train, targets_train, newstart = next_batch(i*batch_size, x_train, targets_train, batch_size=250)
每次迭代都会更改
x_-train
的大小,但仍会继续使用为全尺寸数组创建的
train_-idxs
数组

批量从
x\u train
中提取随机值是一回事,但必须保持选择数组的一致性

由于缺乏一个最小的、可验证的例子,这个问题可能应该结束了。为了重现问题,不得不猜测并制作一个可测试的小例子,这是令人沮丧的

如果我目前的猜测是错误的,那么只需要一些中间的打印语句就可以把问题弄清楚

========================

将代码简化为一个简单的案例

import numpy as np
x_train = np.arange(20).reshape(20,1)
train_idxs = np.arange(x_train.shape[0])
np.random.shuffle(train_idxs)

num_batches_train = 4
batch_size=5
def next_batch(start, train):
    idxs = train_idxs[start:start + batch_size]
    print(train.shape, idxs)
    return train[idxs, :]

for i in range(num_batches_train):
    x_train = next_batch(i*batch_size, x_train)
    print(x_train)
跑步会产生:

1658:~/mypy$ python3 stack39919181.py 
(20, 1) [ 7 18  3  0  9]
[[ 7]
 [18]
 [ 3]
 [ 0]
 [ 9]]
(5, 1) [13  5  2 15  1]
Traceback (most recent call last):
  File "stack39919181.py", line 14, in <module>
    x_train = next_batch(i*batch_size, x_train)
  File "stack39919181.py", line 11, in next_batch
    return train[idxs, :]
IndexError: index 13 is out of bounds for axis 0 with size 5

让它在生成4批5行时运行。

如果大小为250,最后一个索引可能是249,因为它从0开始。如果不洗牌,第一批索引可能是从0到249,但下一批索引可能是250到499,依此类推。。如果我洗牌索引,那么第一批可能有索引号619!我得到的错误是“索引器:索引652超出了大小为250的轴0的界限”。。。我的意思是,问题是它不理解接受行并重置索引!变量
x\u-train
targets\u-train
中是否有传递给函数的内容?我会打印并确保确实有1000行被洗牌。问题是如何迭代,并每次更改
x\u train
。第一次工作的索引在第二次失败。错误不在那一行。后来才用
idxs
作为索引。我认为你的计算没有问题。OP的问题在于他如何迭代和更改
x\u train
。有趣的是,在唯一问题编号后命名程序堆栈背后的逻辑是什么?我已经回答了足够多的问题,这是保持脚本名称唯一的最简单方法。虽然我很少有理由在几天后重温脚本。
for i in range(num_batches_train):
    x_batch = next_batch(i*batch_size, x_train)
    print(x_batch)