具有多个GPU的Tensorflow列车速度_Tensorflow_Time_Gpu

具有多个GPU的Tensorflow列车速度

tensorflow time

具有多个GPU的Tensorflow列车速度,tensorflow,time,gpu,Tensorflow,Time,Gpu,我目前对训练新的tensorflow模型的速度有一个问题。实际上，我假设如果我使用多个GPU进行训练，训练的速度会显著提高。然而，我发现事实并非如此。在本地和谷歌云中进行了几次测试之后，我慢慢地不知道如何显著提高速度。也许有人告诉我如何加快训练。目前，仅训练了10000多张图像，图像大小为628 x 628 我的环境本地： absl-py==0.11.0 astor==0.8.1 cycler==0.10.0 gast==0.4.0 grpcio==1.34.0 h5py==2.10.0 ima

我目前对训练新的tensorflow模型的速度有一个问题。实际上，我假设如果我使用多个GPU进行训练，训练的速度会显著提高。然而，我发现事实并非如此。在本地和谷歌云中进行了几次测试之后，我慢慢地不知道如何显著提高速度。也许有人告诉我如何加快训练。目前，仅训练了10000多张图像，图像大小为628 x 628

我的环境本地：

absl-py==0.11.0
astor==0.8.1
cycler==0.10.0
gast==0.4.0
grpcio==1.34.0
h5py==2.10.0
imageai==2.1.5
importlib-metadata==2.1.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.1.0
Markdown==3.2.2
matplotlib==3.0.3
mock==3.0.5
numpy==1.18.5
opencv-python==4.2.0.32
Pillow==7.2.0
protobuf==3.14.0
pyparsing==2.4.7
python-dateutil==2.8.1
PyYAML==5.3.1
scipy==1.4.1
six==1.15.0
tensorboard==1.12.2
tensorflow-estimator==1.13.0
tensorflow-gpu==1.12.0
termcolor==1.1.0
Werkzeug==1.0.1
zipp==1.2.0

1x Nvidia 1060 with a batch size of 4 = 2,97 hours
1x Tesla T4 with a batch size of 12 = 1,19 hours
2x Tesla T4 with a batch size of 12 = 3,37 hours
2x Tesla T4 with a batch size of 24 = 3,37 hours

Ryzen 53600 Nvidia 1060（6 GB） 50GB内存

我的环境谷歌云：

absl-py==0.11.0
astor==0.8.1
cycler==0.10.0
gast==0.4.0
grpcio==1.34.0
h5py==2.10.0
imageai==2.1.5
importlib-metadata==2.1.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.1.0
Markdown==3.2.2
matplotlib==3.0.3
mock==3.0.5
numpy==1.18.5
opencv-python==4.2.0.32
Pillow==7.2.0
protobuf==3.14.0
pyparsing==2.4.7
python-dateutil==2.8.1
PyYAML==5.3.1
scipy==1.4.1
six==1.15.0
tensorboard==1.12.2
tensorflow-estimator==1.13.0
tensorflow-gpu==1.12.0
termcolor==1.1.0
Werkzeug==1.0.1
zipp==1.2.0

1x Nvidia 1060 with a batch size of 4 = 2,97 hours
1x Tesla T4 with a batch size of 12 = 1,19 hours
2x Tesla T4 with a batch size of 12 = 3,37 hours
2x Tesla T4 with a batch size of 24 = 3,37 hours

一切都在Docker容器中运行

absl-py==0.11.0
astor==0.8.1
cycler==0.10.0
gast==0.4.0
grpcio==1.34.0
h5py==2.10.0
imageai==2.1.5
importlib-metadata==2.1.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.1.0
Markdown==3.2.2
matplotlib==3.0.3
mock==3.0.5
numpy==1.18.5
opencv-python==4.2.0.32
Pillow==7.2.0
protobuf==3.14.0
pyparsing==2.4.7
python-dateutil==2.8.1
PyYAML==5.3.1
scipy==1.4.1
six==1.15.0
tensorboard==1.12.2
tensorflow-estimator==1.13.0
tensorflow-gpu==1.12.0
termcolor==1.1.0
Werkzeug==1.0.1
zipp==1.2.0

16 vCPU 60 GB内存 4 x NVIDIA Tesla T4

每个历元所需时间的我的测试结果：

absl-py==0.11.0
astor==0.8.1
cycler==0.10.0
gast==0.4.0
grpcio==1.34.0
h5py==2.10.0
imageai==2.1.5
importlib-metadata==2.1.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.1.0
Markdown==3.2.2
matplotlib==3.0.3
mock==3.0.5
numpy==1.18.5
opencv-python==4.2.0.32
Pillow==7.2.0
protobuf==3.14.0
pyparsing==2.4.7
python-dateutil==2.8.1
PyYAML==5.3.1
scipy==1.4.1
six==1.15.0
tensorboard==1.12.2
tensorflow-estimator==1.13.0
tensorflow-gpu==1.12.0
termcolor==1.1.0
Werkzeug==1.0.1
zipp==1.2.0

1x Nvidia 1060 with a batch size of 4 = 2,97 hours
1x Tesla T4 with a batch size of 12 = 1,19 hours
2x Tesla T4 with a batch size of 12 = 3,37 hours
2x Tesla T4 with a batch size of 24 = 3,37 hours

为什么使用两台特斯拉T4进行培训比只使用一台需要更长的时间？为什么批量较大时培训时间不会更快？非常感谢您的建议。

您尚未提供网络的体系结构，因此无法准确回答您的问题

如果您在计算机科学课上还记得阿姆达尔定律，并行处理会引入同步开销

如果您的网络不够复杂，多个GPU的培训只会使其速度变慢，因为跨多个GPU的参数更新开销将大于您通过拥有更多处理能力而获得的速度。

我目前将yolov3与imageai一起使用。让训练更快的可能性是什么？我会在云下测试英伟达A100，但即使如此，它也会很快达到它自己的极限。

< P>如果你想要非常有效的模型训练，我建议使用谷歌CopAB（或者更好的，使用谷歌CopAB Pro）。

在我本地的RTX2080机器上，我有一个大约2.5小时/历元的火车模型，但在谷歌的Collab上，它减少到大约30分钟/历元

上次我在开源GitHub上检查时，不支持使用多GPU，但这是一个可能正在进行的功能请求。

我目前将yolov3与imageai一起使用。让训练更快的可能性是什么？接下来我会在云测试英伟达A100，但即使如此，它也会很快达到极限。