Python 训练mt5型号时CUDA内存不足_Python_Nlp_Pytorch

Python 训练mt5型号时CUDA内存不足

python nlp pytorch

Python 训练mt5型号时CUDA内存不足,python,nlp,pytorch,Python,Nlp,Pytorch,我正试图根据这一点在数据集上训练mt5模型我正在使用CUDA 11.1版在训练过程中，我遇到了一个CUDA内存不足错误，尽管我尝试过减少批处理大小和最大序列长度。我还注意到分配给pytorch的GPU不是GPU的完整内存。我如何解决这个问题 import logging import pandas as pd from simpletransformers.t5 import T5Model, T5Args import torch logging.basicConfig(level=log

我正试图根据这一点在数据集上训练mt5模型

我正在使用CUDA 11.1版

在训练过程中，我遇到了一个CUDA内存不足错误，尽管我尝试过减少批处理大小和最大序列长度。我还注意到分配给pytorch的GPU不是GPU的完整内存。我如何解决这个问题

import logging
import pandas as pd
from simpletransformers.t5 import T5Model, T5Args
import torch
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)
train_df = pd.read_csv("en-mr/train.tsv", sep="\t").astype(str)
eval_df = pd.read_csv("en-mr/eval.tsv", sep="\t").astype(str)

train_df["prefix"] = ""
eval_df["prefix"] = ""

train_df=train_df[0:10000]
eval_df=eval_df[0:1000]

model_args = T5Args()
model_args.max_seq_length = 25
model_args.train_batch_size = 1
model_args.eval_batch_size = 1
model_args.num_train_epochs = 1
model_args.evaluate_during_training = False
model_args.use_multiprocessing = False
model_args.fp16 = False
model_args.save_steps = -1
model_args.save_eval_checkpoints = False
model_args.no_cache = True
model_args.reprocess_input_data = True
model_args.overwrite_output_dir = True
model_args.preprocess_inputs = False
model_args.num_return_sequences = 1
model_args.wandb_project = "MT5 Marathi-English Translation"

model = T5Model("mt5", "google/mt5-base", args=model_args)

n = 100
PATH="en-mr/models/model.pth"

for i in range(1):
    model.train_model(train_df[int(len(train_df)*(n/100))*i:int(len(train_df)*(n/100))*(i+1)])
    torch.save(model, PATH)

运行时错误回溯（最近一次调用）
在里面
范围（1）中的i为1：
---->2模型.列车模型（列车方向[int（len（列车方向）*（n/100））*i:int（len（列车方向）*（n/100））*（i+1）]）
3.保存（型号、路径）
~\anaconda3\envs\st\lib\site packages\simpletransformers\t5\t5\u model.py in train\u model（self、train\u data、output\u dir、show\u running\u loss、args、eval\u data、verbose、**kwargs）
216             )
217
-->218自._移动_模型_到_设备（）
219
220列数据集=自。加载列数据集和缓存列数据集示例（列数据，verbose=verbose）
~\anaconda3\envs\st\lib\site packages\simpletransformers\t5\t5\u model.py in\u move\u model\u to\u device（self）
1115
1116 def_移动_型号_至_设备（自身）：
->1117自模式至（自设备）
1118
1119定义获取输入指令（自身、批次）：
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py-in-to（self，*args，**kwargs）
671返回t.to（设备，如果t.是浮点（）或t.是复数（），则为数据类型，否则为无，非阻塞）
672
-->673返回自应用（转换）
674
675 def寄存器向后挂钩(
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py in\u apply（self，fn）
385 def_应用（自，fn）：
386对于self.children（）中的模块：
-->387模块应用（fn）
388
389 def compute_应使用_set_数据（张量、张量应用）：
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py in\u apply（self，fn）
385 def_应用（自，fn）：
386对于self.children（）中的模块：
-->387模块应用（fn）
388
389 def compute_应使用_set_数据（张量、张量应用）：
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py in\u apply（self，fn）
385 def_应用（自，fn）：
386对于self.children（）中的模块：
-->387模块应用（fn）
388
389 def compute_应使用_set_数据（张量、张量应用）：
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py in\u apply（self，fn）
385 def_应用（自，fn）：
386对于self.children（）中的模块：
-->387模块应用（fn）
388
389 def compute_应使用_set_数据（张量、张量应用）：
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py in\u apply（self，fn）
385 def_应用（自，fn）：
386对于self.children（）中的模块：
-->387模块应用（fn）
388
389 def compute_应使用_set_数据（张量、张量应用）：
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py in\u apply（self，fn）
385 def_应用（自，fn）：
386对于self.children（）中的模块：
-->387模块应用（fn）
388
389 def compute_应使用_set_数据（张量、张量应用）：
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py in\u apply（self，fn）
385 def_应用（自，fn）：
386对于self.children（）中的模块：
-->387模块应用（fn）
388
389 def compute_应使用_set_数据（张量、张量应用）：
~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py in\u apply（self，fn）
407#`带手电筒，无梯度（）：`
408带火炬。无梯度（）
-->409应用的参数=fn（参数）
410应该使用设置数据=计算应该使用设置数据（参数，参数已应用）
411如果应使用设置数据：
转换中的~\anaconda3\envs\st\lib\site packages\torch\nn\modules\module.py（t）
669返回t.to（设备，如果t.是浮点（）或t.是复数（），则为数据类型，否则为无，
670非\u阻塞，内存\u格式=将\u转换为\u格式）
-->671返回t.to（设备，如果t.是浮点（）或t.是复数（），则为数据类型，否则为无，非阻塞）
672
673返回自应用（转换）
运行时错误：CUDA内存不足。尝试分配20.00 MiB（GPU 0；6.00 GiB总容量；4.35 GiB已分配；0字节可用；PyTorch总共保留4.53 GiB）

CUDA没有使用完整的GPU ram是内存碎片化的一个不幸的副作用。这里有更多关于这方面的信息

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-0aa9d87284ea> in <module>
      1 for i in range(1):
----> 2     model.train_model(train_df[int(len(train_df)*(n/100))*i:int(len(train_df)*(n/100))*(i+1)])
      3     torch.save(model, PATH)

~\anaconda3\envs\st\lib\site-packages\simpletransformers\t5\t5_model.py in train_model(self, train_data, output_dir, show_running_loss, args, eval_data, verbose, **kwargs)
    216             )
    217 
--> 218         self._move_model_to_device()
    219 
    220         train_dataset = self.load_and_cache_examples(train_data, verbose=verbose)

~\anaconda3\envs\st\lib\site-packages\simpletransformers\t5\t5_model.py in _move_model_to_device(self)
   1115 
   1116     def _move_model_to_device(self):
-> 1117         self.model.to(self.device)
   1118 
   1119     def _get_inputs_dict(self, batch):

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in to(self, *args, **kwargs)
    671             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    672 
--> 673         return self._apply(convert)
    674 
    675     def register_backward_hook(

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    407                 # `with torch.no_grad():`
    408                 with torch.no_grad():
--> 409                     param_applied = fn(param)
    410                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    411                 if should_use_set_data:

~\anaconda3\envs\st\lib\site-packages\torch\nn\modules\module.py in convert(t)
    669                 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
    670                             non_blocking, memory_format=convert_to_format)
--> 671             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    672 
    673         return self._apply(convert)

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 4.35 GiB already allocated; 0 bytes free; 4.53 GiB reserved in total by PyTorch)