Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python CNTK在4 GPU中1比特SGD与正常SGD的速度比较_Python_Neural Network_Gpu_Deep Learning_Cntk - Fatal编程技术网

Python CNTK在4 GPU中1比特SGD与正常SGD的速度比较

Python CNTK在4 GPU中1比特SGD与正常SGD的速度比较,python,neural-network,gpu,deep-learning,cntk,Python,Neural Network,Gpu,Deep Learning,Cntk,我使用Ubuntu(python 3.4)在Azure NC24 GPU虚拟机中安装了CNTK的版本。该机器有4个NVIDIA K80 GPU。构建信息: Build type: release Build target: GPU With 1bit-SGD: yes With ASGD: yes Math lib: mkl CUDA_PATH: /us

我使用Ubuntu(python 3.4)在Azure NC24 GPU虚拟机中安装了CNTK的版本。该机器有4个NVIDIA K80 GPU。构建信息:

            Build type: release
            Build target: GPU
            With 1bit-SGD: yes
            With ASGD: yes
            Math lib: mkl
            CUDA_PATH: /usr/local/cuda-8.0
            CUB_PATH: /usr/local/cub-1.4.1
            CUDNN_PATH: /usr/local
            Build Branch: HEAD
            Build SHA1: 8e8b5ff92eff4647be5d41a5a515956907567126
            Built by Source/CNTK/buildinfo.h$$0 on bbdadbf3455d
            Build Path: /home/philly/jenkins/workspace/CNTK-Build-Linux
我正在分布式模式下运行CIFAR示例:

mpiexec -n 4 python TrainResNet_CIFAR10_Distributed.py -n resnet20 -q 32

Finished Epoch [1]: [Training] loss = 1.675002 * 50176, metric = 62.5% * 50176 112.019s (447.9 samples per second)
Finished Epoch [1]: [Training] loss = 1.675002 * 50176, metric = 62.5% * 50176 112.019s (447.9 samples per second)
Finished Epoch [1]: [Training] loss = 1.675002 * 50176, metric = 62.5% * 50176 112.018s (447.9 samples per second)
Finished Epoch [1]: [Training] loss = 1.675002 * 50176, metric = 62.5% * 50176 112.019s (447.9 samples per second)
Finished Epoch [2]: [Training] loss = 1.247423 * 50176, metric = 45.4% * 50176 8.210s (6111.3 samples per second)
Finished Epoch [2]: [Training] loss = 1.247423 * 50176, metric = 45.4% * 50176 8.210s (6111.4 samples per second)
Finished Epoch [2]: [Training] loss = 1.247423 * 50176, metric = 45.4% * 50176 8.210s (6111.8 samples per second)
Finished Epoch [2]: [Training] loss = 1.247423 * 50176, metric = 45.4% * 50176 8.210s (6111.6 samples per second)
...
...
Finished Epoch [160]: [Training] loss = 0.037745 * 49664, metric = 1.2% * 49664 7.883s (6300.4 samples per second)
Finished Epoch [160]: [Training] loss = 0.037745 * 49664, metric = 1.2% * 49664 7.883s (6299.7 samples per second)
Finished Epoch [160]: [Training] loss = 0.037745 * 49664, metric = 1.2% * 49664 7.884s (6299.7 samples per second)
Finished Epoch [160]: [Training] loss = 0.037745 * 49664, metric = 1.2% * 49664 7.884s (6299.2 samples per second)
但是,当我使用1比特SGD运行时,我得到:

mpiexec -n 4 python TrainResNet_CIFAR10_Distributed.py -n resnet20 -q 1 -a 50000

...
Finished Epoch [160]: [Training] loss = 0.059290 * 49664, metric = 2.1% * 49664 10.055s (4939.1 samples per second)
Finished Epoch [160]: [Training] loss = 0.059290 * 49664, metric = 2.1% * 49664 10.056s (4938.9 samples per second)
Finished Epoch [160]: [Training] loss = 0.059290 * 49664, metric = 2.1% * 49664 10.056s (4938.9 samples per second)
Finished Epoch [160]: [Training] loss = 0.059290 * 49664, metric = 2.1% * 49664 10.056s (4938.9 samples per second)

如前所述,它应该比正常的对应物快。非常感谢您的帮助。

当GPU之间的通信时间比小批量的计算时间长时,1位sgd是一种有效的策略


上面的实验有两个“问题”:你正在训练的模型参数很少(计算量没有那么多),4个GPU在同一台机器上(与通过网络进行通信相比,通信没有那么糟糕)。此外,在机器内部,CNTK使用的是比1位使用的通用MPI实现优化得多的。更新:在发表此评论时,默认情况下不使用NCCL。

谢谢。因此,如果我处理一个更大的问题(比如说ImageNet和ResNet 152),我应该看到一个加速吗?是的,这可能会有所帮助,尽管ResNet通常没有像VGG这样的一些旧网络那样有那么多的参数,它们在最后使用完全连接的层。