Deep learning 如何微调HuggingFace Transformer XL模型以生成文本&;机器翻译?
我有一种机器翻译任务,我必须把英语句子翻译成汉英英语句子。我试图使用预先训练好的Transformer XL模型,在自定义数据集上对其进行微调。这是我的密码:Deep learning 如何微调HuggingFace Transformer XL模型以生成文本&;机器翻译?,deep-learning,tensorflow2.0,huggingface-transformers,transformer,attention-model,Deep Learning,Tensorflow2.0,Huggingface Transformers,Transformer,Attention Model,我有一种机器翻译任务,我必须把英语句子翻译成汉英英语句子。我试图使用预先训练好的Transformer XL模型,在自定义数据集上对其进行微调。这是我的密码: import pandas as pd import tensorflow as tf from transformers import TransfoXLTokenizer from transformers import TFTransfoXLModel import numpy as np from sklearn.model_se
import pandas as pd
import tensorflow as tf
from transformers import TransfoXLTokenizer
from transformers import TFTransfoXLModel
import numpy as np
from sklearn.model_selection import train_test_split
#Loading data
dataFrame = pd.read_csv("data.csv")
dataFrame.head(3)
#-----Output 1-----
#Splitting Dataset
X = dataFrame['English']
Y = dataFrame['Hinglish']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 42)
#Tokenization
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
tokenizer.pad_token = tokenizer.eos_token
XTrainEncodings = tokenizer(X_train.to_list(), max_length = 150, padding = True)
XTestEncodings = tokenizer(X_test.to_list(), max_length = 150, padding = True)
YTrainEncodings = tokenizer(Y_train.to_list(), max_length = 150, padding = True)
YTestEncodings = tokenizer(Y_test.to_list(), max_length = 150, padding = True)
print("XTrainEncodings : ", XTrainEncodings)
print("YTrainEncodings : ", YTrainEncodings)
#-----Output 2-----
#Converting to Tensors
X_train = tf.data.Dataset.from_tensor_slices((dict(XTrainEncodings), (dict(YTrainEncodings))))
X_test = tf.data.Dataset.from_tensor_slices((dict(XTestEncodings), (dict(YTestEncodings))))
print(X_train)
#-----Output 3-----
#Fine Tuning
model = TFTransfoXLModel.from_pretrained('transfo-xl-wt103')
optimizer = tf.keras.optimizers.Adam(learning_rate = 5e-5)
model.compile(optimizer = optimizer, loss = tf.losses.SparseCategoricalCrossentropy(), metrics = ['accuracy'])
history = model.fit(X_train.batch(1), epochs = 2, batch_size = 1, validation_data = X_test.batch(1))
输出:
-----Output 1-----
English Hinglish
How are you ? Tum kaise ho ?
I am fine. Main theek hoon
......
-----Output 2-----
XTrainEncodings : {'input_ids': [[4241, 0, 0, 0, 0, 0], [4827, 37, 304, 788, 0, 0],....
YTrainEncodings : {'input_ids': [[13762, 0, 0, 0, 0], [71271, 24, 33289, 788, 0],....
-----Output 3-----
<TensorSliceDataset shapes: ({input_ids: (6,)}, {input_ids: (5,)}), types: ({input_ids: tf.int32}, {input_ids: tf.int32})>
请帮我找出原因并解决错误。此外,我想知道我是否遵循正确的方法来完成我的任务,或者是否有其他更好的方法。因为我对深度学习还不熟悉,所以我对此不确定。谢谢
ValueError: in user code:
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:795 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:788 run_step **
outputs = model.train_step(data)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:758 train_step
self.compiled_metrics.update_state(y, y_pred, sample_weight)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/compile_utils.py:387 update_state
self.build(y_pred, y_true)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/compile_utils.py:318 build
self._metrics, y_true, y_pred)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:1163 map_structure_up_to
**kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:1245 map_structure_with_tuple_paths_up_to
expand_composites=expand_composites)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:878 assert_shallow_structure
input_length=len(input_tree), shallow_length=len(shallow_tree)))
ValueError: The two structures don't have the same sequence length. Input structure has length 3, while shallow structure has length 2.