Deep learning 网络值通过线性层变为0_Deep Learning_Pytorch_Attention Model

Deep learning 网络值通过线性层变为0

deep-learning pytorch

Deep learning 网络值通过线性层变为0,deep-learning,pytorch,attention-model,Deep Learning,Pytorch,Attention Model,我设计了图形注意力网络。但是，在层内操作期间，特征值变得相等。 class GraphAttentionLayer(nn.Module): ## in_features = out_features = 1024 def __init__(self, in_features, out_features, dropout): super(GraphAttentionLayer, self).__init__() self.dropout = dro

我设计了图形注意力网络。
但是，在层内操作期间，特征值变得相等。

class GraphAttentionLayer(nn.Module):
    ## in_features = out_features = 1024
    def __init__(self, in_features, out_features, dropout):
        super(GraphAttentionLayer, self).__init__()
        self.dropout = dropout
        self.in_features = in_features
        self.out_features = out_features
   
        self.W = nn.Parameter(torch.zeros(size=(in_features, out_features)))
        self.a1 = nn.Parameter(torch.zeros(size=(out_features, 1)))
        self.a2 = nn.Parameter(torch.zeros(size=(out_features, 1)))
        nn.init.xavier_normal_(self.W.data, gain=1.414)
        nn.init.xavier_normal_(self.a1.data, gain=1.414)
        nn.init.xavier_normal_(self.a2.data, gain=1.414)
        self.leakyrelu = nn.LeakyReLU()

    def forward(self, input, adj):
        h = torch.mm(input, self.W)
        a_input1 = torch.mm(h, self.a1)
        a_input2 = torch.mm(h, self.a2)
        a_input = torch.mm(a_input1, a_input2.transpose(1, 0))
        e = self.leakyrelu(a_input)

        zero_vec = torch.zeros_like(e)
        attention = torch.where(adj > 0, e, zero_vec) # most of values is close to 0
        attention = F.softmax(attention, dim=1) # all values are 0.0014 which is 1/707 (707^2 is the dimension of attention)
        attention = F.dropout(attention, self.dropout)
        return attention

“注意”的维度是（707 x 707），我观察到在softmax之前，注意值接近0。
在softmax之后，所有值均为0.0014，即1/707。
我想知道如何保持值的规范化并防止这种情况

谢谢

既然你说这是在训练期间发生的，我想应该是在开始的时候。通过随机初始化，在训练过程开始时，您通常会在网络末端获得几乎相同的值

当所有值或多或少相等时，对于每个元素，softmax的输出将为

1/num_elements

，因此它们在所选维度上的总和为1。所以在你的例子中，你得到了

1/707

作为所有的值，我觉得你的权重刚刚初始化，在这个阶段，输出大部分是随机的

我会让它训练一段时间，观察它是否会发生变化。

什么时候会发生这种情况，你有一个最终的训练模型要执行，还是在训练过程中执行？@Nopileos它会在训练过程中发生。当特征尺寸很大时，我怀疑softmax功能是无效的。i、例如，我们使用softmax进行分类，维度为2，输出形式为[0.001,0.999]。但对于尺寸超过1k的特征，由于函数中的指数，该值将相等，特别是对于较小的值（（e^0.0001）~1）嘿，如果它回答了您的问题，请接受它或告诉缺少的内容。谢谢。哦，斯瑞：）我只是忘了