Tensorflow Pytork在cpu和gpu上为小尺寸张量分配内存,但在超过400 GB的节点上出错
我想通过py3在Databrick(节点为p2.8XL)上构建一个torch.nn.嵌入张量 我的代码:Tensorflow Pytork在cpu和gpu上为小尺寸张量分配内存,但在超过400 GB的节点上出错,tensorflow,pytorch,gpu,cpu,torch,Tensorflow,Pytorch,Gpu,Cpu,Torch,我想通过py3在Databrick(节点为p2.8XL)上构建一个torch.nn.嵌入张量 我的代码: import numpy as np import torch from torch import nn num_embedding, num_dim = 14000, 300 embedding = nn.Embedding(num_embedding, num_dim) row, col = 800000, 302 t = [[x for x in rang
import numpy as np
import torch
from torch import nn
num_embedding, num_dim = 14000, 300
embedding = nn.Embedding(num_embedding, num_dim)
row, col = 800000, 302
t = [[x for x in range(col)] for _ in range(row)]
t1 = torch.tensor(t)
print(t1.shape) # torch.Size([800000, 302])
t1.dtype, t1.nelement() # torch.int64, 241600000
type(t1), t1.device, (t1.nelement() * t1.element_size())/(1024**3) # (torch.Tensor, device(type='cpu'), 1.8000602722167969)
tt = embedding(t1) # error [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 288,000,000,000 bytes. Error code 12 (Cannot allocate memory)
t2 = t1.cuda()
t2.device, t2.shape, t2.grad, t2.nelement(), t2.element_size(), (t2.nelement() * t2.element_size())/(1024**3) # (device(type='cuda', index=0), torch.Size([800000, 302]), None, 241600000, 8, 1.8000602722167969)
embedding_cuda = embedding.cuda()
embedding_cuda(t2) # CUDA out of memory. Tried to allocate 270.01 GiB (GPU 0; 11.17 GiB total capacity; 7.16 GiB already allocated; 2.01 GiB free; 8.88 GiB reserved in total by PyTorch)
我不明白为什么给定张量的大小小于2GB(1.8GB),但无法定位到cpu和gpu?为什么cpu和gpu必须分配这么大的270.01 GiB
我错过什么了吗
谢谢