Python 微调需要很多时间_Python_Tensorflow_Keras

Python 微调需要很多时间

python tensorflow keras

Python 微调需要很多时间,python,tensorflow,keras,Python,Tensorflow,Keras,我必须用28个可能的类来完成文本分类任务。我决定将伯特的模型作为预先训练的模型加载，并对其进行微调以解决我的问题。问题是训练非常慢（在GPU上），而我确保冻结BERT的层，这样我只需要在最后训练一个密集层。以下是我用来创建模型的代码： from tensorflow.keras.layers import Input from tensorflow.keras import Model from transformers import TFDistilBertModel, DistilBertC

我必须用28个可能的类来完成文本分类任务。我决定将伯特的模型作为预先训练的模型加载，并对其进行微调以解决我的问题。问题是训练非常慢（在GPU上），而我确保冻结BERT的层，这样我只需要在最后训练一个密集层。以下是我用来创建模型的代码：

from tensorflow.keras.layers import Input
from tensorflow.keras import Model
from transformers import TFDistilBertModel, DistilBertConfig

distil_bert = 'distilbert-base-uncased'
def Bert(out_shape,max_seq_length):

 config = DistilBertConfig(dropout=0.2, attention_dropout=0.2)
 config.output_hidden_states = False
 transformer_model = TFDistilBertModel.from_pretrained(distil_bert, config = config)

 input_ids_in = Input(shape=(max_seq_length,), name='input_token', dtype='int32')
 input_masks_in = Input(shape=(max_seq_length,), name='masked_token', dtype='int32') 

 embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]
 #X = Bidirectional(LSTM(50, return_sequences=True))(embedding_layer)
 X = GlobalMaxPool1D()(embedding_layer)
 #X = Flatten()(embedding_layer)
 #X = Dropout(0.2)(X)
 #X = Dense(2*len(categories), activation='relu')(X)
 X = Dropout(0.2)(X)
 X = Dense(out_shape, activation='softmax')(X)

 model = Model(inputs=[input_ids_in, input_masks_in], outputs = X)

 for layer in model.layers[:3]:
   layer.trainable = False

 return model

model = Bert(len(categories),MAX_SEQUENCE_LENGTH)
model.summary()

总结如下：

如您所见，我只有22000个参数需要学习，我不明白为什么每个历元需要这么长时间（几乎10分钟）。在使用BERT之前，我使用了一个经典的双向LSTM模型，它有超过1M个参数，每个历元只需要15秒

有人能帮我一下吗？

即使你冻结了Bert，你仍然需要进行前向传递，对于66M参数的模型来说，前向传递的计算量通常比1M参数的网络要大。