如何在Pytorch中的模块之间共享权重？_Pytorch

如何在Pytorch中的模块之间共享权重？

pytorch

如何在Pytorch中的模块之间共享权重？,pytorch,Pytorch,Pytorch中两层（模块）之间共享权重的正确方法是什么？根据我在Pytorch讨论论坛上的发现，有几种方法可以做到这一点。例如，基于，我认为简单地分配转置权重就可以了。这就是： self.decoder[0].weight = self.encoder[0].weight.t() 然而，这被证明是错误的，并导致了错误。然后，我尝试将上述行包装成nn.Parameter（）：这消除了错误，但是这里没有共享。通过此操作，我刚刚初始化了一个新的张量，其值与编码器[0].weight.t（

Pytorch中两层（模块）之间共享权重的正确方法是什么？
根据我在Pytorch讨论论坛上的发现，有几种方法可以做到这一点。
例如，基于，我认为简单地分配转置权重就可以了。这就是：

 self.decoder[0].weight = self.encoder[0].weight.t()

然而，这被证明是错误的，并导致了错误。然后，我尝试将上述行包装成

nn.Parameter（）

：

这消除了错误，但是这里没有共享。通过此操作，我刚刚初始化了一个新的张量，其值与编码器[0].weight.t（）的值相同

然后我发现这为分享权重提供了不同的方法。然而，我怀疑这里给出的所有方法是否都是正确的。
例如，一种方法如下所示：

# tied autoencoder using off the shelf nn modules
class TiedAutoEncoderOffTheShelf(nn.Module):
    def __init__(self, inp, out, weight):
        super().__init__()
        self.encoder = nn.Linear(inp, out, bias=False)
        self.decoder = nn.Linear(out, inp, bias=False)

        # tie the weights
        self.encoder.weight.data = weight.clone()
        self.decoder.weight.data = self.encoder.weight.data.transpose(0,1)

    def forward(self, input):
        encoded_feats = self.encoder(input)
        reconstructed_output = self.decoder(encoded_feats)
        return encoded_feats, reconstructed_output

weights = nn.Parameter(torch.randn_like(self.encoder[0].weight))
self.encoder[0].weight.data = weights.clone()
self.decoder[0].weight.data = self.encoder[0].weight.data.transpose(0, 1)

基本上，它使用

nn.Parameter（）

创建一个新的权重张量，并将其分配给每个层/模块，如下所示：

# tied autoencoder using off the shelf nn modules
class TiedAutoEncoderOffTheShelf(nn.Module):
    def __init__(self, inp, out, weight):
        super().__init__()
        self.encoder = nn.Linear(inp, out, bias=False)
        self.decoder = nn.Linear(out, inp, bias=False)

        # tie the weights
        self.encoder.weight.data = weight.clone()
        self.decoder.weight.data = self.encoder.weight.data.transpose(0,1)

    def forward(self, input):
        encoded_feats = self.encoder(input)
        reconstructed_output = self.decoder(encoded_feats)
        return encoded_feats, reconstructed_output

weights = nn.Parameter(torch.randn_like(self.encoder[0].weight))
self.encoder[0].weight.data = weights.clone()
self.decoder[0].weight.data = self.encoder[0].weight.data.transpose(0, 1)

这真的让我困惑，这两层之间如何共享相同的变量？它不仅仅是克隆“原始”数据吗当我使用这种方法并可视化权重时，我注意到可视化效果不同，这让我更加确定某些东西不正确。
我不确定不同的可视化是否仅仅是因为其中一个是另一个的转置，或者正如我刚才怀疑的那样，它们是独立优化的（即，层之间不共享权重）

权重初始化示例：

人工智能问题通常有被错误理解的倾向，尤其是这一个问题。我将重新表述您的问题，模块M1的A层和模块M2的B层是否可以共享权重WA=WB，以及可能的WA=WB_转换

这可以通过PyTorch钩子实现，您可以在其中更新A的前钩子以改变WB，也可以在M2 autograd中冻结WB

所以就用钩子吧

事实证明，经过进一步的调查（只需重新传输解码器的重量并将其可视化），它们确实是共享的。
以下是编码器和解码器权重的可视化：

有趣的是，你的第一直觉是对的@Rika:

这真的让我困惑，这两层之间如何共享相同的变量？这不仅仅是克隆“原始”数据吗

事实上，很多人在博客帖子或自己的回购协议中都犯了这个错误

也

self.decoder[0].weight=nn.Parameter（self.encoder[0].weight.t（））

将简单地创建一个新的权重矩阵，如您所写

唯一可行的做法似乎是使用nn.linear（

torch.nn.functional.linear（）

）调用的线性函数：

谢谢，有没有代码示例来演示这一点？顺便问一下，为什么在我们的例子中，一个层（它本身就是一个模块）是在另一个模块中还是不在另一个模块中会有任何不同？我看不出有什么区别。你能更具体一点，更详细一点吗？我添加了这个例子，但你必须为你的具体情况重写。非常感谢。我真的很感谢你的友好回复。然而，在进一步调查之后，我注意到，我所做的，确实在模块之间共享权重！没有必要进入这么复杂的领域。

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import init

# Real off-the-shelf tied linear module
class TiedLinear(nn.Module):
    def __init__(self, tied_to: nn.Linear, bias: bool = True):
        super().__init__()
        self.tied_to = tied_to
        if bias:
            self.bias = nn.Parameter(torch.Tensor(tied_to.in_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    # copied from nn.Linear
    def reset_parameters(self):
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.tied_to.weight.t())
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return F.linear(input, self.tied_to.weight.t(), self.bias)

    # To keep module properties intuitive
    @property
    def weight(self) -> torch.Tensor:
        return self.tied_to.weight.t()

# Shared weights, different biases
encoder = nn.Linear(in, out)
decoder = TiedLinear(encoder)