Tensorflow Pytork在cpu和gpu上为小尺寸张量分配内存,但在超过400 GB的节点上出错

Tensorflow Pytork在cpu和gpu上为小尺寸张量分配内存,但在超过400 GB的节点上出错,tensorflow,pytorch,gpu,cpu,torch,Tensorflow,Pytorch,Gpu,Cpu,Torch,我想通过py3在Databrick(节点为p2.8XL)上构建一个torch.nn.嵌入张量 我的代码: import numpy as np import torch from torch import nn num_embedding, num_dim = 14000, 300 embedding = nn.Embedding(num_embedding, num_dim) row, col = 800000, 302 t = [[x for x in rang

我想通过py3在Databrick(节点为p2.8XL)上构建一个torch.nn.嵌入张量

我的代码:

  import numpy as np
  import torch
  from torch import nn

  num_embedding, num_dim = 14000, 300
  embedding = nn.Embedding(num_embedding, num_dim)
  row, col = 800000, 302
  t = [[x for x in range(col)] for _ in range(row)] 
  
  t1 = torch.tensor(t)
  print(t1.shape) # torch.Size([800000, 302])
  
  t1.dtype, t1.nelement() # torch.int64, 241600000

  type(t1), t1.device, (t1.nelement() * t1.element_size())/(1024**3) # (torch.Tensor, device(type='cpu'), 1.8000602722167969)

  tt = embedding(t1) # error [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 288,000,000,000 bytes. Error code 12 (Cannot allocate memory)

  t2 = t1.cuda()
  t2.device, t2.shape, t2.grad, t2.nelement(), t2.element_size(), (t2.nelement() * t2.element_size())/(1024**3) # (device(type='cuda', index=0), torch.Size([800000, 302]), None, 241600000, 8, 1.8000602722167969)

  embedding_cuda = embedding.cuda()
  embedding_cuda(t2) #  CUDA out of memory. Tried to allocate 270.01 GiB (GPU 0; 11.17 GiB total capacity; 7.16 GiB already allocated; 2.01 GiB free; 8.88 GiB reserved in total by PyTorch)
我不明白为什么给定张量的大小小于2GB(1.8GB),但无法定位到cpu和gpu?为什么cpu和gpu必须分配这么大的270.01 GiB

我错过什么了吗

谢谢