Python loss.backward（）如何处理批处理？_Python_Pytorch_Batch Processing_Reinforcement Learning_Q Learning

Python loss.backward（）如何处理批处理？

python pytorch

Python loss.backward（）如何处理批处理？,python,pytorch,batch-processing,reinforcement-learning,q-learning,Python,Pytorch,Batch Processing,Reinforcement Learning,Q Learning,所以我现在正在训练一个DDQN来玩连接四。在每个状态下，网络都会预测最佳动作，并相应地移动。代码基本上如下所示： for epoch in range(num_epochs): for i in range(batch_size): while game is not finished: action = select_action(state) new

所以我现在正在训练一个DDQN来玩连接四。在每个状态下，网络都会预测最佳动作，并相应地移动。代码基本上如下所示：

for epoch in range(num_epochs):
        for i in range(batch_size):
                while game is not finished:
                        action = select_action(state)
                        new_state = play_move(state, action)
                        pred, target = get_pred_target(state, new_state, action)
                        preds = torch.cat([preds, pred])
                        targets = torch.cat([targets, target)]
        loss = loss_fn(preds, targets)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

在训练期间，网络变得更好了一点，但没有我预期的那么好。考虑到这一点，我现在想知道我是否真正正确地实现了loss.backward（）调用。关键是，我保存了张量preds和targets中每个移动的所有预测和目标。然而，我并没有追踪导致这些预测和目标的国家。但是，这不是反向传播所必需的吗，或者这些信息是以某种方式保存的吗

多谢各位