pytorch中图像的最大池_Pytorch_Dimension_Cnn_Max Pooling

pytorch中图像的最大池

pytorch

pytorch中图像的最大池,pytorch,dimension,cnn,max-pooling,Pytorch,Dimension,Cnn,Max Pooling,我正试图将maxpool2d（来自torch.nn）应用于单个图像（而不是作为maxpool层）。这是我现在的代码： name = 'astronaut' imshow(images[name], name) img = images[name] # pool of square window of size=3, stride=1 m = nn.MaxPool2d(3,stride = 1) img_transform = torch.Tensor(images[name]) plt.imsh

我正试图将maxpool2d（来自torch.nn）应用于单个图像（而不是作为maxpool层）。这是我现在的代码：

name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))

问题是，这段代码给了我一个非常绿色的图像。我确信问题在于视图的维度，但我无法找到如何将maxpool应用于一个图像，因此无法修复它。我考虑的图像尺寸是512x512。视图的参数现在对我来说毫无意义，它只是给出结果的唯一数字

例如，如果我将512512作为view的参数，我会得到以下错误：

RuntimeError: shape '[512, 512]' is invalid for input of size 261120

如果有人能告诉我如何将maxpool、avgpool或minpool应用于图像并显示结果，我将不胜感激

谢谢（：

假设您的图像在加载时是一个

numpy.array

（请参阅注释了解每个步骤的解释）：

如果您的图像是黑白的，您需要形状

[1,1,512,512]

（仅限单通道），您不能离开/挤压这些尺寸，它们必须始终存在于任何
torch.nn.Module
！

要再次将张量转换为图像，可以使用类似的步骤：

# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)

# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape  # Shape: [510, 510, 3]

# Cast to numpy and you have your image
final_image = width_height_channels.numpy()

我运行了代码（它工作了！），但我的结果是一个黑色图像。这是因为在您将其转换为long后，每个条目都变为0。我进行了检查，new_img给了我实际的数字，但没有_batch给了零。我们需要将其转换为long吗？如果需要，如何避免所有0？编辑：我将long更改为float，得到了正确的结果，只是为了理解您为什么首先选择long？谢谢或者你所有的帮助！@tweepy_ques这取决于你原来的

img

是什么，我假设它是

int

类型，并且有

[0-255]

范围，因为你没有提供这些信息。如果它是

[0,1]中的浮点

range您不应该执行任何铸造。在第二步中，这再次取决于您希望图像在range

[0255]

或range

[0,1]

中的

浮动方式。此外，您可能需要不同的数据格式，例如[Channels，Width，Height]
与[Width，Height，Channel]
第二种更流行（例如在tensorflow中使用），而第一种则由PyTorch使用。
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)

# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape  # Shape: [510, 510, 3]

# Cast to numpy and you have your image
final_image = width_height_channels.numpy()