Python ValueError在Pytorch中使用枚举(HDF5数据)

Python ValueError在Pytorch中使用枚举(HDF5数据),python,deep-learning,pytorch,Python,Deep Learning,Pytorch,每当我加载自定义数据集(将CIFAR-10缩减为512个功能)并尝试使用常用方法对其进行训练时,我都会遇到ValueError。 因为我收到了一个HDF5文件,所以我定义了自定义Pytorch数据集,如下所示: class HDF5Dataset(Dataset): # load the dataset def __init__(self, path): # load the csv file as a dataframe features = p

每当我加载自定义数据集(将CIFAR-10缩减为512个功能)并尝试使用常用方法对其进行训练时,我都会遇到ValueError。 因为我收到了一个HDF5文件,所以我定义了自定义Pytorch数据集,如下所示:

class HDF5Dataset(Dataset):
    # load the dataset
    def __init__(self, path):
        # load the csv file as a dataframe
        features = pd.DataFrame(np.array(h5py.File(path)['features']))
        labels = pd.DataFrame(np.array(h5py.File(path)['labels']))
        
        # store the inputs and outputs
        self.X = features
        self.y = labels
        # ensure input data is floats
        self.X = self.X.astype('float32')
        # label encode target and ensure the values are floats
        self.y = LabelEncoder().fit_transform(self.y)
        self.y = self.y.astype('float32')
        self.y = self.y.reshape((len(self.y), 1))

    # number of rows in the dataset
    def __len__(self):
        return len(self.X)

    # get a row at an index
    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

    # get indexes for train and test rows
    def get_splits(self, n_test=0.33):
        # determine sizes
        test_size = round(n_test * len(self.X))
        train_size = len(self.X) - test_size
        print(test_size)
        # calculate the split
        return random_split(self, [train_size, test_size])
当我现在试图通过

path = "resnet50_dim512.hdf5"
train_dl, test_dl = prepare_data(path)
print(len(train_dl.dataset), len(test_dl.dataset))
# define the network
model = MLP(34)
# train the model
train_model(train_dl, model)
我遇到以下错误消息:

ValueError                                Traceback (most recent call last)
~/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    349             try:
--> 350                 return self._range.index(new_key)
    351             except ValueError:

ValueError: 901 is not in range

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-37-fd90001f6a16> in <module>
    156 # train the model
    157 print(train_dl)
--> 158 train_model(train_dl, model)
    159 # evaluate the model
    160 acc = evaluate_model(test_dl, model)

<ipython-input-37-fd90001f6a16> in train_model(train_dl, model)
    106     for epoch in range(100):
    107         # enumerate mini batches
--> 108         for i, (inputs, targets) in enumerate(train_dl):
    109             # clear the gradients
    110             optimizer.zero_grad()
以及发生错误的培训功能:

def train_model(train_dl, model):
    # define the optimization
    criterion = BCELoss()
    optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

    # enumerate epochs
    for epoch in range(100):
        # enumerate mini batches
        for i, (inputs, targets) in enumerate(train_dl):
            # clear the gradients
            optimizer.zero_grad()
            # compute the model output
            yhat = model(inputs)
            # calculate loss
            loss = criterion(yhat, targets)
            # credit assignment
            loss.backward()
            # update model weights
            optimizer.step()
如果有人能告诉我为什么会发生这个错误,如何修复它,甚至避免以这种方式处理HDF5文件,或者如果先将其转换为另一种文件格式以简化培训过程是一种更好的方法,那就太好了


非常好。

您能在
HDF5Dataset
中的
init\uuuu
末尾打印
self.X.shape
以及
self.y.shape
吗?
train\u dl
的批大小为32,
test\u dl
的批大小为1024。将
test\u dl
更改为32。@yudhiesh我不知道这有什么关系,
batch\u size
与数据集的索引无关,因为它与数据加载程序相关。另外,运行培训时会出现错误。OP,请打印
self.X.shape
self.y.shape
。您能在
HDF5Dataset
中的
末尾打印
self.X.shape
self.y.shape
以及
self.y.shape
吗。将
test\u dl
更改为32。@yudhiesh我不知道这有什么关系,
batch\u size
与数据集的索引无关,因为它与数据加载程序相关。另外,运行培训时会出现错误。OP,请打印
self.X.shape
self.y.shape
def train_model(train_dl, model):
    # define the optimization
    criterion = BCELoss()
    optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

    # enumerate epochs
    for epoch in range(100):
        # enumerate mini batches
        for i, (inputs, targets) in enumerate(train_dl):
            # clear the gradients
            optimizer.zero_grad()
            # compute the model output
            yhat = model(inputs)
            # calculate loss
            loss = criterion(yhat, targets)
            # credit assignment
            loss.backward()
            # update model weights
            optimizer.step()