Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ajax/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何为PyTorch中的组合网络获得正确的梯度?_Pytorch_Gradient_Transformer_Autograd - Fatal编程技术网

如何为PyTorch中的组合网络获得正确的梯度?

如何为PyTorch中的组合网络获得正确的梯度?,pytorch,gradient,transformer,autograd,Pytorch,Gradient,Transformer,Autograd,我使用由FAIR和我自己的分类器预先训练的ESM变压器组成的组合网络。我应该提到,我将transformer的第一层更改为identity层,因为我想用Embedded计算嵌入。为了以后能够对它们进行操作,请将此变压器的层分开。 我想为输入计算梯度,但是,当我使用torch autograd时,我得到了一个错误的梯度。我通过使用f(x+e)-f(x-e))/2e手动计算一个位置的梯度来检查它,其中f-是我的组合网络,x-输入的特定位置,e-小增量。我不确定我的自动差速器到底出了什么问题。 这是我

我使用由FAIR和我自己的分类器预先训练的ESM变压器组成的组合网络。我应该提到,我将transformer的第一层更改为identity层,因为我想用Embedded计算嵌入。为了以后能够对它们进行操作,请将此变压器的层分开。 我想为输入计算梯度,但是,当我使用torch autograd时,我得到了一个错误的梯度。我通过使用f(x+e)-f(x-e))/2e手动计算一个位置的梯度来检查它,其中f-是我的组合网络,x-输入的特定位置,e-小增量。我不确定我的自动差速器到底出了什么问题。 这是我的密码:

# ======== split the transformer model in two, getting access to the embedding layer=====
splitted_model = []
for name, module in transformer.named_children():
    splitted_model.append(module)

# take the first layer
embedding_layer = splitted_model[0]
embedding = embedding_layer(input_tokens)
#replace the embedding layer with an Identity layer
identity_layer = torch.nn.Identity()
transformer.embed_tokens = identity_layer
#set token dropout to False
transformer.args.token_dropout=False

# this will be an input to my combined network
input_embedding = embedding_layer(input_tokens)

# ========. combine transformer and my classifier. ==============
# create class
class FullModel(nn.Module):
    def __init__(self, transformer, classifier_nn):
        super(FullModel, self).__init__()
        self.transformer = transformer
        self.classifier_nn = classifier_nn
        
    def forward(self, x):
# calculate representation matrix of a shape L, E. L - length on input sequence, E - length of feature verctor. token 0 is a start-of-sequence token, so the first symbol of input is token 1
        x1 = self.transformer(x, repr_layers=[34])["representations"][34][0,1 : len(x[0]) + 1]
# average over L to get representation vector of length E
        x2 = torch.mean(x1, dim=0)
# use it for the classification
        x3 = self.classifier_nn(x2)
        return x3
# combine
transformer_model = transformer
classifier_nn = my_classifier
final_model = FullModel(transformer_model, classifier_nn)

# ======== compute gradient =========
# define for which output class i want to get gradient
external = torch.tensor([1,0,0]) #1st out of 3 possible

x = Variable(input_embedding, requires_grad=True)
pred = final_model(x)
pred.backward(gradient=external)
input_gradient = x.grad