Performance Pytorch中Titan XP vs Quadro P400 GPU_Performance_Time_Cuda_Gpu_Pytorch

Performance Pytorch中Titan XP vs Quadro P400 GPU

performance time cuda pytorch

Performance Pytorch中Titan XP vs Quadro P400 GPU,performance,time,cuda,gpu,pytorch,Performance,Time,Cuda,Gpu,Pytorch,我在我的机器上尝试了两个GPU，我期望Titan XP比Quadro-P400更快。然而，两者给出的执行时间几乎相同我需要知道PyTorch是否会动态地选择一个GPU而不是另一个，或者，我自己必须指定在运行时使用哪一个PyTorch 以下是测试中使用的代码段： import torch import time def do_something(gpu_device): torch.cuda.set_device(gpu_device) # torch.cuda.set_device

我在我的机器上尝试了两个GPU，我期望Titan XP比Quadro-P400更快。然而，两者给出的执行时间几乎相同

我需要知道PyTorch是否会动态地选择一个GPU而不是另一个，或者，我自己必须指定在运行时使用哪一个PyTorch

以下是测试中使用的代码段：

import torch
import time

def do_something(gpu_device):
    torch.cuda.set_device(gpu_device)  # torch.cuda.set_device(device_num)
    print("current GPU device ", torch.cuda.current_device())
    strt = time.time()
    a = torch.randn(100000000).cuda()   
    xx = time.time() - strt
    print("execution time, to create 1E8 random numbers, is ", xx)
    # print(a)
    # print(a + 2)

no_of_GPUs= torch.cuda.device_count()
print("how many GPUs are there:", no_of_GPUs)
for i  in range(0, no_of_GPUs):
    print(i, "th GPU is", torch.cuda.get_device_name(i))
    do_something(i)

样本输出：

how many GPUs are there: 2
0 th GPU is TITAN Xp COLLECTORS EDITION
current GPU device  0
execution time, to create 1E8 random numbers, is  5.527713775634766

1 th GPU is Quadro P400
current GPU device  1
execution time, to create 1E8 random numbers, is  5.511776685714722

尽管您可能相信，但您看到的性能差异的缺乏是因为随机数生成是在主机CPU上运行的，而不是在GPU上运行的。如果我修改您的

程序，请执行以下操作：
def do_something(gpu_device, ongpu=False, N=100000000):
    torch.cuda.set_device(gpu_device)
    print("current GPU device ", torch.cuda.current_device())
    strt = time.time()
    if ongpu:
        a = torch.cuda.FloatTensor(N).normal_()
    else:
        a = torch.randn(N).cuda()
    print("execution time, to create 1E8 random no, is ", time.time() - strt)
    return a

通过两种方式运行，我得到了非常不同的执行时间：
In [4]: do_something(0)
current GPU device  0
execution time, to create 1E8 random no, is  7.736972808837891
Out[4]: 

-9.3955e-01
-1.9721e-01
-1.1502e+00
     ......     
-1.2428e+00
 3.1547e-01
-2.1870e+00
[torch.cuda.FloatTensor of size 100000000 (GPU 0)]

In [5]: do_something(0,True)
current GPU device  0
execution time, to create 1E8 random no, is  0.001735687255859375
Out[5]: 

 4.1403e+06
 5.7016e+06
 1.2710e+07
     ......     
 8.9790e+06
 1.3779e+07
 8.0731e+06
[torch.cuda.FloatTensor of size 100000000 (GPU 0)]

i、 你的版本需要7秒，我的版本需要1.7毫秒。我认为很明显哪一个是在GPU上运行的…
你确定你的do\u something
函数实际上是在GPU上创建这些随机数，而不是在CPU上，然后将结果传输到GPU吗？@Talonmes.这行不是应该。。。a=torch.randn（100000000）.cuda（）。。。在GPU上创建随机数？我对torch不太了解，但仅仅是Python语法就不知道了。我把它理解为“在默认内存空间中创建一个张量，并将该张量复制到GPU”。我猜，默认值是CPU内存库！这次它真的在GPU上。然而，Quadro仍然比Titan Xp快，不知道为什么（顺便说一句，我使用的是Cuda 8.0版）！第0个GPU是TITAN Xp COLLECTORS EDITION-创建1E8随机编号的执行时间是6.3180922346191406E-05，，，，，，，，，第1个GPU是Quadro P400，创建1E8随机编号的执行时间是4.482269287109375e-05 CUDA API是异步的。再一次，您可能没有测量实际的执行时间，我不知道pytorch内部是如何工作的，所以我不知道是否是这样。尝试使用像nvprof这样的分析工具查看执行时间