Python Pytorch聚集函数,用于获取深度Q的当前Q值
我试图计算当前Q值以进行深度Q学习,但遇到以下错误:Python Pytorch聚集函数,用于获取深度Q的当前Q值,python,pytorch,reinforcement-learning,Python,Pytorch,Reinforcement Learning,我试图计算当前Q值以进行深度Q学习,但遇到以下错误: RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorScatterGather.cu:16 触发此错误的代码如下: curr\u Q=self.model.forward(states).gather(
RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorScatterGather.cu:16
触发此错误的代码如下:
curr\u Q=self.model.forward(states).gather(1,actions.unsqueze(1))
self.model.forward(states)
的形状是[32,640,10],动作的形状是[32]
此函数中的其余代码如下所示:
def compute_loss(self, batch):
states, actions, rewards, next_states, dones = batch
states = torch.FloatTensor(states).to(self.device)
actions = torch.LongTensor(actions).to(self.device)
rewards = torch.FloatTensor(rewards).to(self.device)
next_states = torch.FloatTensor(next_states).to(self.device)
dones = torch.FloatTensor(dones)
curr_Q = self.model.forward(states).gather(1, actions.unsqueeze(1))
curr_Q = curr_Q.squeeze(1)
next_Q = self.model.forward(next_states)
max_next_Q = torch.max(next_Q, 1)[0]
expected_Q = rewards.squeeze(1) + self.gamma * max_next_Q
loss = self.MSE_loss(curr_Q, expected_Q)
return loss
查看self.model.forward(状态)的输出维度是否与操作的维度匹配。取消查询(1)查看self.model.forward(状态)的输出维度是否与操作的维度匹配。取消查询(1)