Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pytorch聚集函数,用于获取深度Q的当前Q值_Python_Pytorch_Reinforcement Learning - Fatal编程技术网

Python Pytorch聚集函数,用于获取深度Q的当前Q值

Python Pytorch聚集函数,用于获取深度Q的当前Q值,python,pytorch,reinforcement-learning,Python,Pytorch,Reinforcement Learning,我试图计算当前Q值以进行深度Q学习,但遇到以下错误: RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorScatterGather.cu:16 触发此错误的代码如下: curr\u Q=self.model.forward(states).gather(

我试图计算当前Q值以进行深度Q学习,但遇到以下错误:

RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorScatterGather.cu:16
触发此错误的代码如下:

curr\u Q=self.model.forward(states).gather(1,actions.unsqueze(1))

self.model.forward(states)
的形状是[32,640,10],动作的形状是[32]

此函数中的其余代码如下所示:

 def compute_loss(self, batch):
    states, actions, rewards, next_states, dones = batch
    states = torch.FloatTensor(states).to(self.device)
    actions = torch.LongTensor(actions).to(self.device)
    rewards = torch.FloatTensor(rewards).to(self.device)
    next_states = torch.FloatTensor(next_states).to(self.device)
    dones = torch.FloatTensor(dones)

    curr_Q = self.model.forward(states).gather(1, actions.unsqueeze(1))
    curr_Q = curr_Q.squeeze(1)
    next_Q = self.model.forward(next_states)
    max_next_Q = torch.max(next_Q, 1)[0]
    expected_Q = rewards.squeeze(1) + self.gamma * max_next_Q

    loss = self.MSE_loss(curr_Q, expected_Q)
    return loss

查看self.model.forward(状态)的输出维度是否与操作的维度匹配。取消查询(1)查看self.model.forward(状态)的输出维度是否与操作的维度匹配。取消查询(1)