Neural network 利用tensorflow中的tf.nn.conv2d_转置获得反褶积层的输出形状_Neural Network_Tensorflow_Deep Learning_Conv Neural Network

Neural network 利用tensorflow中的tf.nn.conv2d_转置获得反褶积层的输出形状

neural-network tensorflow deep-learning

Neural network 利用tensorflow中的tf.nn.conv2d_转置获得反褶积层的输出形状,neural-network,tensorflow,deep-learning,conv-neural-network,Neural Network,Tensorflow,Deep Learning,Conv Neural Network,据此，输出形状为N+H-1，N为输入高度或宽度，H为内核高度或宽度。这是明显的卷积逆过程。这给出了一个计算卷积输出形状的公式，即（W−F+2P）/S+1，W-输入大小，F-过滤器大小，P-填充大小，S-步幅。但在中，有如下测试用例： strides = [1, 2, 2, 1] # Input, output: [batch, height, width, depth] x_shape = [2, 6, 4, 3] y_shape = [2, 12, 8, 2] # Fi

据此，输出形状为

N+H-1

，

为输入高度或宽度，

为内核高度或宽度。这是明显的卷积逆过程。这给出了一个计算卷积输出形状的公式，即

（W−F+2P）/S+1

，

-输入大小，

-过滤器大小，

-填充大小，

-步幅。但在中，有如下测试用例：

  strides = [1, 2, 2, 1]

  # Input, output: [batch, height, width, depth]
  x_shape = [2, 6, 4, 3]
  y_shape = [2, 12, 8, 2]

  # Filter: [kernel_height, kernel_width, output_depth, input_depth]
  f_shape = [3, 3, 2, 3]

因此，我们使用

y_形

，

f_形

和

x_形

，根据公式

（W−F+2P）/S+1来计算填充大小P
。从（12-3+2P）/2+1=6
，我们得到P=0.5
，这不是一个整数。反褶积在Tensorflow中是如何工作的？
本教程中的输出大小公式假设填充p在图像前后是相同的（左和右或上和下）。
然后，放置内核的位置数为：
W（图像大小）-F（内核大小）+P（之前的附加填充）+P（之后的附加填充）

但是tensorflow也可以处理这样的情况，即需要将更多的像素放在一侧而不是另一侧，这样内核才能正确地匹配。您可以在中阅读有关选择填充（“相同”
和“有效”
）的策略的更多信息。您正在谈论的测试使用方法“VALID”
此讨论非常有用。只需添加一些附加信息。
padding='SAME'
还可以让底部和右侧获得一个额外的填充像素。根据，以及下面的测试用例
strides = [1, 2, 2, 1]
# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 12, 8, 2]

# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]

正在使用padding='SAME'。我们可以将padding='SAME'解释为：
(W−F+pad_along_height)/S+1 = out_height,
(W−F+pad_along_width)/S+1 = out_width.

所以（12-3+pad\u沿高度）/2+1=6
，我们得到pad\u沿高度=1
。和pad\u top=pad\u沿高度/2=1/2=0
（整数除法），pad\u bottom=pad\u沿高度-pad\u top=1

至于padding='VALID'，顾名思义，我们在适当的时候使用padding。首先，我们假设填充像素=0，如果效果不好，那么在原始输入图像区域之外的任何值处添加0填充。例如，下面的测试用例
strides = [1, 2, 2, 1]

# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 13, 9, 2]

# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]

conv2d
的输出形状为
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
           = ceil(float(13 - 3 + 1) / float(3)) = ceil(11/3) = 6
           = (W−F)/S + 1.

原因<代码>（W−F） /S+1=（13-3）/2+1=6

，结果是一个整数，我们不需要在图像边界周围添加0个像素，而padding='VALID'部分中的

pad_top=1/2

，

pad_left=1/2

都是0。

对于反褶积

output_size = strides * (input_size-1) + kernel_size - 2*padding

步长、输入大小、内核大小和填充都是整数

“有效”的填充为零

答案是关于

tf.nn.conv2d

，填充模式如何适用于

tf.nn.conv2d\u转置

tf.nn.conv2d_transpose

将使输出张量大于输入张量。我认为“相同”的填充为零，但“有效”的填充为一些值。@VatsalAggarwal请验证您的评论。“相同”的大小相同。因此，添加了填充来维护它