Python Pytorch聚集函数，用于获取深度Q的当前Q值_Python_Pytorch_Reinforcement Learning

Python Pytorch聚集函数，用于获取深度Q的当前Q值

python pytorch

Python Pytorch聚集函数，用于获取深度Q的当前Q值,python,pytorch,reinforcement-learning,Python,Pytorch,Reinforcement Learning,我试图计算当前Q值以进行深度Q学习，但遇到以下错误： RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorScatterGather.cu:16 触发此错误的代码如下： curr\u Q=self.model.forward（states）.gather（

我试图计算当前Q值以进行深度Q学习，但遇到以下错误：

RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorScatterGather.cu:16

触发此错误的代码如下：

curr\u Q=self.model.forward（states）.gather（1，actions.unsqueze（1））

self.model.forward（states）

的形状是[32,640,10]，动作的形状是[32]

此函数中的其余代码如下所示：

 def compute_loss(self, batch):
    states, actions, rewards, next_states, dones = batch
    states = torch.FloatTensor(states).to(self.device)
    actions = torch.LongTensor(actions).to(self.device)
    rewards = torch.FloatTensor(rewards).to(self.device)
    next_states = torch.FloatTensor(next_states).to(self.device)
    dones = torch.FloatTensor(dones)

    curr_Q = self.model.forward(states).gather(1, actions.unsqueeze(1))
    curr_Q = curr_Q.squeeze(1)
    next_Q = self.model.forward(next_states)
    max_next_Q = torch.max(next_Q, 1)[0]
    expected_Q = rewards.squeeze(1) + self.gamma * max_next_Q

    loss = self.MSE_loss(curr_Q, expected_Q)
    return loss

查看self.model.forward（状态）的输出维度是否与操作的维度匹配。取消查询（1）查看self.model.forward（状态）的输出维度是否与操作的维度匹配。取消查询（1）