Python 在Keras中对Conv2D内核应用掩码

Python 在Keras中对Conv2D内核应用掩码,python,tensorflow,machine-learning,keras,convolution,Python,Tensorflow,Machine Learning,Keras,Convolution,我希望在Keras中的Conv2D层的内核上应用一个掩码。我理解内核形状有点困难 class MaskedConv2D(tf.keras.layers.Layer): def __init__(self, *args, **kwargs): super(MaskedConv2D, self).__init__() self.conv2d = Conv2D(*args, **kwargs) def build(self, input

我希望在Keras中的Conv2D层的内核上应用一个掩码。我理解内核形状有点困难

class MaskedConv2D(tf.keras.layers.Layer):
    def __init__(self, *args, **kwargs):
        super(MaskedConv2D, self).__init__()
        self.conv2d = Conv2D(*args, **kwargs)
        
    def build(self, input_shape):
        self.conv2d.build(input_shape[0])
        self._convolution_op = self.conv2d._convolution_op
        
    def masked_convolution_op(self, filters, kernel, mask):
        m = K.expand_dims(K.expand_dims(mask[0, ...], axis=2), axis=3) # (3, 3) => (3, 3, 1, 1)
        m = K.tile(m, (1, 1, kernel.shape[2], kernel.shape[3])) # (3, 3, 1, 1) => (3, 3, 4, 1)
        return self._convolution_op(filters, tf.math.multiply(kernel, m))        
        
    def call(self, inputs):
        x, mask = inputs
        self.conv2d._convolution_op = functools.partial(self.masked_convolution_op, mask=mask)
        return self.conv2d.call(x)
对于kernel_size=3,filters=1,内核的形状为(3,3,4,1)=>(kernel_size,kernel_size,filters)

内核中的第三维代表什么

如何获取一个NxN掩码并将其乘以每个内核过滤器

这是我目前掌握的代码。我不确定它是否会像我期望的那样工作,因为我不完全理解内核的形状

class MaskedConv2D(tf.keras.layers.Layer):
    def __init__(self, *args, **kwargs):
        super(MaskedConv2D, self).__init__()
        self.conv2d = Conv2D(*args, **kwargs)
        
    def build(self, input_shape):
        self.conv2d.build(input_shape[0])
        self._convolution_op = self.conv2d._convolution_op
        
    def masked_convolution_op(self, filters, kernel, mask):
        m = K.expand_dims(K.expand_dims(mask[0, ...], axis=2), axis=3) # (3, 3) => (3, 3, 1, 1)
        m = K.tile(m, (1, 1, kernel.shape[2], kernel.shape[3])) # (3, 3, 1, 1) => (3, 3, 4, 1)
        return self._convolution_op(filters, tf.math.multiply(kernel, m))        
        
    def call(self, inputs):
        x, mask = inputs
        self.conv2d._convolution_op = functools.partial(self.masked_convolution_op, mask=mask)
        return self.conv2d.call(x)
第一:内核大小 2D卷积的内核大小如下所示

[ height, width, input_filters, output_filters ]
第三个维度的大小与输入过滤器的大小相同。这是至关重要的

让我们考虑如何手工完成卷积。以下是步骤:

  • 从一批图像(批次大小、高度、宽度、过滤器)中获取补丁
  • 将其重塑为[BatchSize,height*width*filters]
  • 矩阵将其乘以重塑的内核[高度*宽度*过滤器,输出过滤器]
  • 此时,我们的数据呈[BatchSize,output_filters]形状
  • 通过在整个批次中广播,添加形状[output_filters]的偏差
输出是每个补丁的过滤器

如何将NxN掩码应用于过滤器 假设我们知道卷积中的权重是成形的 像
[height,width,input\u filters,output\u filters]
一样,我们想要正确地应用一个
[height,width]
的掩码,可以像这样广播该掩码

masked_weight = weight * mask.reshape([height,width,1,1])
我们的Tensorflow keras层可以这样写

class MaskedConv2D(tf.keras.layers.Layer):
    def __init__(self, *args, **kwargs):
        super(MaskedConv2D, self).__init__()
        self.conv2d = tf.keras.layers.Conv2D(*args, **kwargs)
        
    def build(self, input_shape):
        self.conv2d.build(input_shape[0])
        self._convolution_op = self.conv2d._convolution_op
        
    def masked_convolution_op(self, filters, kernel, mask):
        return self._convolution_op(filters, tf.math.multiply(kernel, tf.reshape(mask, mask.shape + [1,1] )))
        
    def call(self, inputs):
        x, mask = inputs
        self.conv2d._convolution_op = functools.partial(self.masked_convolution_op, mask=mask)
        return self.conv2d.call(x)
我们可以用下面的脚本来测试它

mcon = MaskedConv2D(filters=2,kernel_size=[3,3])

# hack: initialize it by running some data through it
mcon((np.ones([1,4,4,3], dtype=np.float32), tf.constant([[1,1,0],[1,1,1],[0,1,1]], dtype=tf.float32)))

# set all the weights to 1 for testing
mcon.set_weights([ np.ones([3,3,3,2]) , np.zeros([2]) ])

# pass in a matrix of 1s and mask out 2 elements for each input filter
mcon((np.ones([1,4,4,3], dtype=np.float32), tf.constant([[1,1,0],[1,1,1],[0,1,1]], dtype=tf.float32)))
哪个有可预测的输出

<tf.Tensor: shape=(1, 2, 2, 2), dtype=float32, numpy=
    array([[[[21., 21.],
             [21., 21.]],
            [[21., 21.],
             [21., 21.]]]], dtype=float32)>