Python 是否可以解决TypeError:argument'；输入'；（位置1）必须是张量误差而不重新训练模型？_Python_Pytorch_Typeerror_Valueerror_Openai Gym

Python 是否可以解决TypeError:argument'；输入'；（位置1）必须是张量误差而不重新训练模型？

python pytorch

Python 是否可以解决TypeError:argument'；输入'；（位置1）必须是张量误差而不重新训练模型？,python,pytorch,typeerror,valueerror,openai-gym,Python,Pytorch,Typeerror,Valueerror,Openai Gym,我在PyTorch中制作了一个模型，用于openAI健身房环境。我是这样做的： class Policy(nn.Module): def __init__(self, s_size=8, h_size=16, a_size=4): super(Policy, self).__init__() self.fc1 = nn.Linear(s_size, h_size) self.fc2 = nn.Linear(h_size, 32)

我在PyTorch中制作了一个模型，用于openAI健身房环境。我是这样做的：

class Policy(nn.Module):
    def __init__(self, s_size=8, h_size=16, a_size=4):
        super(Policy, self).__init__()
        self.fc1 = nn.Linear(s_size, h_size)
        self.fc2 = nn.Linear(h_size, 32)
        self.fc3 = nn.Linear(32, 64)
        self.fc4 = nn.Linear(64, a_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return F.softmax(x, dim=1 )
    
    def act(self, state):
        state = torch.from_numpy(state).float().unsqueeze(0).to(device)
        probs = self.forward(state).cpu()
        m = Categorical(probs)
        action = m.sample()
        return action.item(), m.log_prob(action)

然后我将其状态保存在字典中，并按如下方式使用：

env = gym.make('LunarLander-v2')

policy = Policy().to(torch.device('cpu'))
policy.load_state_dict(torch.load('best_params_cloud.ckpt', map_location='cpu'))
policy.eval()
ims = []
rewards = []
state = env.reset()
for step in range(STEPS):
    img = env.render(mode='rgb_array')
    action,log_prob = policy(state)
        # print(action)
    state,reward,done,i_ = env.step(action)
    rewards.append(reward)
    # print(reward,done)
    cv2_im_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    pil_im = Image.fromarray(cv2_im_rgb)

    draw = ImageDraw.Draw(pil_im)

    # Choose a font
    font = ImageFont.truetype("Roboto-Regular.ttf", 20)

    # Draw the text
    draw.text((0, 0), f"Step: {step} Action : {action} Reward: {int(reward)} Total Rewards: {int(np.sum(rewards))} done: {done}", font=font,fill="#FDFEFE")

    # Save the image
    img = cv2.cvtColor(np.array(pil_im), cv2.COLOR_RGB2BGR)
    im = plt.imshow(img, animated=True)
    ims.append([im])
    if done:
        env.close()


                
        
        break

Writer = animation.writers['pillow']
writer = Writer(fps=15, metadata=dict(artist='Me'), bitrate=1800)
im_ani = animation.ArtistAnimation(fig, ims, interval=50, repeat_delay=3000,
                                    blit=True)
im_ani.save('ll_train1.gif', writer=writer)

但这会返回错误：

TypeError                                 Traceback (most recent call last)
<ipython-input-3-da32222edde2> in <module>
      9 for step in range(STEPS):
     10     img = env.render(mode='rgb_array')
---> 11     action,log_prob = policy(state)
     12         # print(action)
     13     state,reward,done,i_ = env.step(action)

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

<ipython-input-2-66d42ebb791e> in forward(self, x)
     33 
     34     def forward(self, x):
---> 35         x = F.relu(self.fc1(x))
     36         x = F.relu(self.fc2(x))
     37         x = F.relu(self.fc3(x))

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~\anaconda3\lib\site-packages\torch\nn\modules\linear.py in forward(self, input)
     92 
     93     def forward(self, input: Tensor) -> Tensor:
---> 94         return F.linear(input, self.weight, self.bias)
     95 
     96     def extra_repr(self) -> str:

~\anaconda3\lib\site-packages\torch\nn\functional.py in linear(input, weight, bias)
   1751     if has_torch_function_variadic(input, weight):
   1752         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1753     return torch._C._nn.linear(input, weight, bias)
   1754 
   1755 

TypeError: linear(): argument 'input' (position 1) must be Tensor, not numpy.ndarray

但这也会返回一个错误：

ValueError:没有足够的值来解包（预期值为2，实际值为1）

该策略花费了大量时间进行培训，我正试图避免对其进行再培训，是否有一种解决方法可以让它在不进行再培训的情况下运行？

此错误与您的模型无关。

forward

函数只返回概率分布，但您需要的是动作和相应的概率（Policy.act的输出）

从更改代码

步进范围（步数）：
img=env.render（mode='rgb_array'）
#这一行导致错误。
操作，log_prob=策略（状态）

到

步进范围（步数）：
img=env.render（mode='rgb_array'）
#这一行导致错误。
action，log_prob=policy.act（state）

请上传第二个错误的回溯。

def forward(self, x):
        x = torch.tensor(x,dtype=torch.float32,device=DEVICE).unsqueeze(0) //Added this line
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return F.softmax(x, dim=1 )