Python loss.backward()如何处理批处理?
所以我现在正在训练一个DDQN来玩连接四。在每个状态下,网络都会预测最佳动作,并相应地移动。代码基本上如下所示:Python loss.backward()如何处理批处理?,python,pytorch,batch-processing,reinforcement-learning,q-learning,Python,Pytorch,Batch Processing,Reinforcement Learning,Q Learning,所以我现在正在训练一个DDQN来玩连接四。在每个状态下,网络都会预测最佳动作,并相应地移动。代码基本上如下所示: for epoch in range(num_epochs): for i in range(batch_size): while game is not finished: action = select_action(state) new
for epoch in range(num_epochs):
for i in range(batch_size):
while game is not finished:
action = select_action(state)
new_state = play_move(state, action)
pred, target = get_pred_target(state, new_state, action)
preds = torch.cat([preds, pred])
targets = torch.cat([targets, target)]
loss = loss_fn(preds, targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
在训练期间,网络变得更好了一点,但没有我预期的那么好。考虑到这一点,我现在想知道我是否真正正确地实现了loss.backward()调用。关键是,我保存了张量preds和targets中每个移动的所有预测和目标。然而,我并没有追踪导致这些预测和目标的国家。但是,这不是反向传播所必需的吗,或者这些信息是以某种方式保存的吗
多谢各位