Python loss.backward()如何处理批处理?

Python loss.backward()如何处理批处理?,python,pytorch,batch-processing,reinforcement-learning,q-learning,Python,Pytorch,Batch Processing,Reinforcement Learning,Q Learning,所以我现在正在训练一个DDQN来玩连接四。在每个状态下,网络都会预测最佳动作,并相应地移动。代码基本上如下所示: for epoch in range(num_epochs): for i in range(batch_size): while game is not finished: action = select_action(state) new

所以我现在正在训练一个DDQN来玩连接四。在每个状态下,网络都会预测最佳动作,并相应地移动。代码基本上如下所示:

for epoch in range(num_epochs):
        for i in range(batch_size):
                while game is not finished:
                        action = select_action(state)
                        new_state = play_move(state, action)
                        pred, target = get_pred_target(state, new_state, action)
                        preds = torch.cat([preds, pred])
                        targets = torch.cat([targets, target)]
        loss = loss_fn(preds, targets)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
在训练期间,网络变得更好了一点,但没有我预期的那么好。考虑到这一点,我现在想知道我是否真正正确地实现了loss.backward()调用。关键是,我保存了张量preds和targets中每个移动的所有预测和目标。然而,我并没有追踪导致这些预测和目标的国家。但是,这不是反向传播所必需的吗,或者这些信息是以某种方式保存的吗

多谢各位