Python 理解并实施要素式注意模块_Python_Tensorflow_Machine Learning_Keras_Deep Learning

Python 理解并实施要素式注意模块

python tensorflow machine-learning keras deep-learning

Python 理解并实施要素式注意模块,python,tensorflow,machine-learning,keras,deep-learning,Python,Tensorflow,Machine Learning,Keras,Deep Learning,请对您的想法添加最低限度的评论，以便我可以改进我的查询。谢谢我试图理解并实施一项关于的研究工作，包括 - channel-wise attention (a) - element-wise attention (b) - scale-wise attention (c) 该机制在DenseNet模型中进行了实验集成。整个模型图的拱形为。通道方向的注意模块只是挤压和激发块。这将进一步向元素方向的注意力模块提供sigmoid输出。下面是这些模块的更精确的功能流程图（a、b和c）理

请对您的想法添加最低限度的评论，以便我可以改进我的查询。谢谢

我试图理解并实施一项关于的研究工作，包括

- channel-wise attention  (a)
- element-wise attention  (b)
- scale-wise attention    (c)

该机制在

DenseNet

模型中进行了实验集成。整个模型图的拱形为。通道方向的注意模块只是挤压和激发块。这将进一步向元素方向的注意力模块提供

sigmoid

输出。下面是这些模块的更精确的功能流程图（

、

和

）

理论在大多数情况下，我能够理解并实现它，但在

元素方面的注意部分（上图中的部分b
）有点不知所措。这就是我需要你帮助的地方
这里有一个关于这个话题的小理论，让你大致了解这一切。请注意，该论文现在不能公开访问，但在发布初期，在出版商页面上可以免费获取，当时我保存了它。公平地说，我要和你们分享。无论如何，从论文（第4.3节）可以看出：
因此首先在所有函数中，f（att）
函数（在第一个就地图中，左中间部分或b
）由三个卷积层组成，其中512内核为1x1
，512内核为3x3
和C
内核为1x1
。这里C
是分类器的编号。并通过Softmax
激活
接下来，它适用于通道方向的
注意模块，我们提到的是一个SENet
模块，并给出了一个sigmoid
概率分数，即X（CA）
。因此，通过函数f（att）
，我们得到C
次softmax
概率分数，每个分数与sigmoid
输出相乘，最后生成特征图A
（根据上图的等式4）
Second，有一个C
线性分类器，实现为1x1
-C
内核卷积层。该层还应用于SENet
模块的输出，即X（CA）
，以像素为单位应用于每个特征向量。最后，它给出了特征图S
（方程式5如下图所示）的输出
而第三个，他们将每个信心分数（S

）乘以相应的注意力元素A。这个乘法是故意的。他们这样做是为了防止不必要的关注特征图。为了使其有效，他们还使用

加权交叉熵

损失函数来最小化分类基本真相和得分向量之间的损失

我的查询

主要是在网络的中间没有适当的最小化策略。我希望有人能给我一个适当的理解，并详细实施上述文件（第4.3节）中提出的“要素注意机制”

实施以下是入门的最低代码。我想应该够了。这是肤浅的实现，但与原始的元素模块有太多的距离。我不知道如何正确地实施它。现在，我希望它作为一个层，应该可以插入和播放任何模型。我尝试使用MNIST和一个简单的

Conv

net

总之，对于MNIST，我们应该有一个包含

频道和元素注意模型的网络，然后是最后一个10单位softmax
层。例如：
Net: Conv2D - Attentions-Module - GAP - Softmax(10)

注意力模块由两部分组成：频道部分和元素部分，以及元素部分也应具有Softmax
，该模块将加权CE
损失函数最小化到基本真理部分和得分向量部分（根据上面已经描述过的文件）。该模块还将加权特征映射传递到连续层。为了更清楚，这里有一个简单的示意图，说明我们正在寻找的内容
好的，对于通道方面的注意，它应该给我们一个单一的概率分数（sigmoid
），为了简单起见，现在让我们使用一个假层：
class FakeSE(tf.keras.layers.Layer):
    def __init__(self):
        super(Block, self).__init__()
        # conv layer
        self.conv = tf.keras.layers.Conv2D(10, padding='same',
                                           kernel_size=3)
    def call(self, input_tensor, training=False):
        x = self.conv(input_tensor)
        return tf.math.sigmoid(x)

对于元素方面的注意部分，以下是迄今为止失败的尝试：
class ElementWiseAttention(tf.keras.layers.Layer):
    def __init__(self):
        # for simplicity the f(attn) function here has 2 convolution instead of 3
        # self.conv1, and self.conv2
        self.conv1 = tf.keras.layers.Conv2D(16, 
                                            kernel_size=1, 
                                            strides=1, padding='same',
                                            use_bias=True, activation=tf.nn.silu)

        self.conv2 = tf.keras.layers.Conv2D(10, 
                                            kernel_size=1, 
                                            strides=1, padding='same',
                                            use_bias=False, activation=tf.keras.activations.softmax)
        
        # fake SENet or channel-wise attention module 
        self.cam = FakeSE()
        
        # a linear layer 
        self.linear = tf.keras.layers.Conv2D(10,
                                           kernel_size=1,
                                           strides=1, padding='same',
                                           use_bias=True, activation=None)
        
        super(ElementWiseAttention, self).__init__()
    
    def call(self, inputs):
        # 2 stacked conv layer (in paper, it's 3. we set 2 for simplicity)
        # this is the f(att)
        x = self.conv1(inputs)
        x = self.conv2(x)
        
        # this is the A = f(att)*X(CA)
        camx = self.cam(x)*x
        
        # this is S = X(CA)*Linear_Classifier
        linx = self.cam(self.linear(inputs))

        # element-wise multiply to prevent unnecessary attention
        # suppose to minimize with weighted cross entorpy loss 
        out = tf.multiply(camx, linx)
        
        return out

上面这一层是感兴趣的层。如果我正确理解了纸上的单词，这一层不仅应该最小化加权损失函数到gt
和score\u向量
，还应该生成一些加权特征图（2D
）
跑
这是玩具数据

(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, axis=-1)
x_train = x_train.astype('float32') / 255
x_train = tf.image.resize(x_train, [32,32]) # if we want to resize 
y_train = tf.keras.utils.to_categorical(y_train , num_classes=10) 

# Model 
input = tf.keras.Input(shape=(32,32,1))
efnet = tf.keras.applications.DenseNet121(weights=None,
                                             include_top = False, 
                                             input_tensor = input)
em =  ElementWiseAttention()(efnet.output)
# Now that we apply global max pooling.
gap = tf.keras.layers.GlobalMaxPooling2D()(em)

# classification layer.
output = tf.keras.layers.Dense(10, activation='softmax')(gap)

# bind all
func_model = tf.keras.Model(efnet.input, output)
func_model.compile(
          loss      = tf.keras.losses.CategoricalCrossentropy(),
          metrics   = tf.keras.metrics.CategoricalAccuracy(),
          optimizer = tf.keras.optimizers.Adam())
# fit 
func_model.fit(x_train, y_train, batch_size=32, epochs=3, verbose = 1)

理解元素级注意
当论文介绍他们的方法时，他们说：
注意模块旨在利用疾病之间的关系
标签和（1）诊断特定功能通道（2）
图像上的诊断特定位置（即胸部区域
异常）和（3）特征图的诊断特定比例
（1） （2），（3）对应于通道型注意、元素型注意、尺度型注意
我们可以看出，元素式注意用于处理疾病位置和体重信息，即：在图像上的每个位置，疾病发生的可能性有多大，，正如本文介绍元素式注意时再次提到的：<
import tensorflow as tf
import numpy as np

ALPHA = 1/16
C = 10
D = 128

class ChannelWiseAttention(tf.keras.layers.Layer):
    def __init__(self):
        super(ChannelWiseAttention, self).__init__()

        # squeeze
        self.gap = tf.keras.layers.GlobalAveragePooling2D()
        
        # excitation
        self.fc0 = tf.keras.layers.Dense(int(ALPHA * D), use_bias=False, activation=tf.nn.relu)
        
        self.fc1 = tf.keras.layers.Dense(D, use_bias=False, activation=tf.nn.sigmoid)

        # reshape so we can do channel-wise multiplication
        self.rs = tf.keras.layers.Reshape((1, 1, D))

    def call(self, inputs):
        # calculate channel-wise attention vector
        z = self.gap(inputs)
        u = self.fc0(z)
        u = self.fc1(u)
        u = self.rs(u)
        return u * inputs

class ElementWiseAttention(tf.keras.layers.Layer):
    def __init__(self):
        super(ElementWiseAttention, self).__init__()

        # f(att)
        self.conv0 = tf.keras.layers.Conv2D(512, 
                                            kernel_size=1, 
                                            strides=1, padding='same',
                                            use_bias=True, activation=tf.nn.relu)
        
        self.conv1 = tf.keras.layers.Conv2D(512, 
                                            kernel_size=3, 
                                            strides=1, padding='same',
                                            use_bias=True, activation=tf.nn.relu)

        self.conv2 = tf.keras.layers.Conv2D(C, 
                                            kernel_size=1, 
                                            strides=1, padding='same',
                                            use_bias=False, activation=tf.keras.activations.softmax)
        
        # linear classifier
        self.linear = tf.keras.layers.Conv2D(C,
                                           kernel_size=1,
                                           strides=1, padding='same',
                                           use_bias=True, activation=None)
        
        # for calculate score vector to training element-wise attention module
        self.gap = tf.keras.layers.GlobalAveragePooling2D()
        self.sfm = tf.keras.layers.Softmax()

    def call(self, inputs):
        # f(att)
        a = self.conv0(inputs)
        a = self.conv1(a)
        a = self.conv2(a)
        
        # confidence score
        s = self.linear(inputs)

        # element-wise multiply to prevent unnecessary attention 
        m = s * a
        # using to minimize with weighted cross entorpy loss
        y_hat = self.gap(m)
        # could also using sigmoid like in paper
        out = self.sfm(y_hat)

        return m, out

(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, axis=-1)
x_train = x_train.astype('float32') / 255
x_train = tf.image.resize(x_train, [32,32]) # if we want to resize 
y_train = tf.keras.utils.to_categorical(y_train , num_classes=10) 

# Model 
input = tf.keras.Input(shape=(32,32,1))
efnet = tf.keras.applications.DenseNet121(weights=None,
                                             include_top = False, 
                                             input_tensor = input)

xca = ChannelWiseAttention()(efnet.get_layer("conv3_block1_0_bn").output)
m, output =  ElementWiseAttention()(xca)

# bind all
func_model = tf.keras.Model(efnet.input, output)
func_model.compile(
          loss      = tf.keras.losses.CategoricalCrossentropy(),
          metrics   = tf.keras.metrics.CategoricalAccuracy(),
          optimizer = tf.keras.optimizers.Adam())
# fit 
func_model.fit(x_train, y_train, batch_size=64, epochs=3, verbose = 1)