pytorch强化学习中改变输入类型的问题_Pytorch

pytorch强化学习中改变输入类型的问题

pytorch

pytorch强化学习中改变输入类型的问题,pytorch,Pytorch,我正在尝试tun我在github上找到的一段代码，但在一部分中不断崩溃并得到TypeError：只有size-1数组可以转换为Python标量error。我已经试着解决了两天了。如果我正确理解这个问题，我有不兼容的张量数据类型，但我不知道如何解决这个问题。每次我试着把张量类型改成双张量，或者float64，我都会遇到其他错误。我现在根本不知道哪一部分需要更改，以及如何更改这是我定义的模型： class Policy(nn.Module): def __init__(self):

我正在尝试tun我在github上找到的一段代码，但在一部分中不断崩溃并得到

TypeError：只有size-1数组可以转换为Python标量error。我已经试着解决了两天了。如果我正确理解这个问题，我有不兼容的张量数据类型，但我不知道如何解决这个问题。每次我试着把张量类型改成双张量，或者float64，我都会遇到其他错误。我现在根本不知道哪一部分需要更改，以及如何更改
这是我定义的模型：
class Policy(nn.Module):
    def __init__(self):      
        super(Policy, self).__init__()
        self.input_layer = nn.Linear(11, 128)
        self.hidden_1 = nn.Linear(128, 128)
        self.hidden_2 = nn.Linear(32,31)
        self.hidden_state = torch.tensor(torch.zeros(2,1,32)).cuda()
        self.rnn = nn.GRU(128, 32, 2)
        self.action_head = nn.Linear(31, 5)
        self.value_head = nn.Linear(31, 1)
        self.saved_actions = []
        self.rewards = []

    def reset_hidden(self):
        self.hidden_state = torch.tensor(torch.zeros(2,1,32)).cuda()
        
    def forward(self, x): 
        x = torch.tensor(x).cuda()
        x = torch.sigmoid(self.input_layer(x))
        x = torch.tanh(self.hidden_1(x))
        x, self.hidden_state = self.rnn(x.view(1,-1,128), self.hidden_state.data)
        x = F.relu(self.hidden_2(x.squeeze()))
        action_scores = self.action_head(x)
        state_values = self.value_head(x)
        return F.softmax(action_scores, dim=-1), state_values

        def forward(self, x):
          conv_out = self.conv(x).view(x.size()[0], -1)
          val = self.fc_val(conv_out)
          adv = self.fc_adv(conv_out)
          return val + (adv - adv.mean(dim=1, keepdim=True))
    
    def act(self, state):
        probs, state_value = self.forward(state)
        m = Categorical(probs)
        action = m.sample()
        if action == 1 and env.state[0] < 1: action = torch.LongTensor([2]).squeeze().cuda.DoubleTensor()
        if action == 4 and env.state[1] < 1: action = torch.LongTensor([2]).squeeze().cuda.DoubleTensor()
        if action == 6 and env.state[2] < 1: action = torch.LongTensor([2]).squeeze().cuda.DoubleTensor()
        self.saved_actions.append((m.log_prob(action), state_value))
        return action.item()

这就是我得到的错误
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:18: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-74-21b617d2e36f> in <module>()
     46     msg = None
     47     while not done:
---> 48         action = model.act(state)
     49         state, reward, done, msg = env.step(action)
     50         model.rewards.append(reward)

4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1751     if has_torch_function_variadic(input, weight):
   1752         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1753     return torch._C._nn.linear(input, weight, bias)
   1754 
   1755 

RuntimeError: expected scalar type Double but found Float

/usr/local/lib/python3.7/dist packages/ipykernel_launcher.py:18:UserWarning:要从张量复制构造，建议使用sourceTensor.clone（）.detach（）或sourceTensor.clone（）.detach（）.requires_grad_（True），而不是torch.tensor（sourceTensor）。
---------------------------------------------------------------------------
运行时错误回溯（上次最近调用）
在（）
46 msg=无
47未完成时：
--->48动作=模型动作（状态）
49状态，奖励，完成，消息=环境步骤（操作）
50模型。奖励。附加（奖励）
4帧
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py（输入、重量、偏差）
1751如果具有可变功能（输入、重量）：
1752返回手柄功能（线性，（输入，重量），输入，重量，偏差=偏差）
->1753返回火炬。线性（输入、重量、偏差）
1754
1755
RuntimeError:应为标量类型Double，但找到Float

这是我正在使用的github：

有人建议，这个问题是因为当我使用“state”作为参数时，torch抛出一个错误，因为它需要一个数字类型，但我不能将state更改为任何float，因为我得到了另一个错误，即list不能更改为float32
如果您能证明我做错了什么，我将不胜感激。
首先，您是否使用相同的环境“env”和/或数据集
其次，您添加了这行state=state.type（torch.float32），它没有抛出错误，所以我认为state已经是张量了（这有点奇怪）。如果必须将类型更改为float32，那么在下一个while循环中，您可能忘记了更改类型
while not done:
    action = model.act(state)  
    state, reward, done, msg = env.step(action)  
    state = state.float()  # To add as I think env.step(action) returns a long tensor for some reason
    model.rewards.append(reward)  
    if done:  
        break

祝你好运。
你必须将环境步骤（动作）
的输出转换成一个张量，简单地说火炬.张量（状态）
我很抱歉问了一个愚蠢的问题，但你是说火炬.张量（状态），奖励，完成，msg=env.步骤（动作）
？它向我抛出了一个错误无法分配给函数调用
，所以我想，我错了..在您的代码中，状态必须是一个张量，因为它通过网络。确保它始终是这样。此外，请记住，您在每次迭代时都在重置环境，也许这不是您想要做的。
while not done:
    action = model.act(state)  
    state, reward, done, msg = env.step(action)  
    state = state.float()  # To add as I think env.step(action) returns a long tensor for some reason
    model.rewards.append(reward)  
    if done:  
        break