Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/multithreading/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Multithreading 使用tensorflow在并行GPU上运行超参数优化_Multithreading_Tensorflow_Deep Learning - Fatal编程技术网

Multithreading 使用tensorflow在并行GPU上运行超参数优化

Multithreading 使用tensorflow在并行GPU上运行超参数优化,multithreading,tensorflow,deep-learning,Multithreading,Tensorflow,Deep Learning,我有一个培训功能,可以在这里对tf模型进行端到端培训(设计仅用于说明): 我想使用每个gpu的模型在20个试验中运行hyperparameter优化: from threading import Thread exp_trials = list(hyperparams.trials(num=20)) train_threads = [] for gpu_num, trial_params in zip(['0', '1', '2', '3']*5, exp_trials): t = Th

我有一个培训功能,可以在这里对tf模型进行端到端培训(设计仅用于说明):

我想使用每个gpu的模型在20个试验中运行hyperparameter优化:

from threading import Thread
exp_trials = list(hyperparams.trials(num=20))
train_threads = []
for gpu_num, trial_params in zip(['0', '1', '2', '3']*5, exp_trials):
    t = Thread(target=opt_fx, args=(trial_params, gpu_num,))
    train_threads.append(t)

# Start the threads, and block on their completion.
for t in train_threads:
  t.start()

for t in train_threads:
  t.join()

然而,这失败了。。。正确的方法是什么

我不确定这是否是最好的方法,但我最终做的是为每个设备定义一个图表,并在单独的会话中训练每个设备。这可以并行化。我尝试在不同的设备中重用该图形,但没有成功。下面是我的版本在代码中的外观(完整示例):

请注意,
train
函数使用上下文中的
图中的张量和操作进行操作,即每个
成本
优化
是不同的

这将生成以下输出,显示两个模型并行训练:

Start training on /gpu
Start training on /cpu
Device /cpu: epoch #0 step=0 cost=2.302585
Device /cpu: epoch #0 step=20 cost=1.788247
Device /cpu: epoch #0 step=40 cost=1.400490
Device /cpu: epoch #0 step=60 cost=1.271820
Device /gpu: epoch #0 step=0 cost=2.302585
Device /cpu: epoch #0 step=80 cost=1.128214
Device /gpu: epoch #0 step=20 cost=2.105802
Device /cpu: epoch #0 step=100 cost=0.927004
Device /cpu: epoch #1 step=0 cost=0.905336
Device /gpu: epoch #0 step=40 cost=1.908744
Device /cpu: epoch #1 step=20 cost=0.865687
Device /gpu: epoch #0 step=60 cost=1.808407
Device /cpu: epoch #1 step=40 cost=0.754765
Device /gpu: epoch #0 step=80 cost=1.676024
Device /cpu: epoch #1 step=60 cost=0.794201
Device /gpu: epoch #0 step=100 cost=1.513800
Device /gpu: epoch #1 step=0 cost=1.451422
Device /cpu: epoch #1 step=80 cost=0.786958
Device /gpu: epoch #1 step=20 cost=1.415125
Device /cpu: epoch #1 step=100 cost=0.643715
Device /cpu: epoch #2 step=0 cost=0.674683
Device /gpu: epoch #1 step=40 cost=1.273473
Device /cpu: epoch #2 step=20 cost=0.658424
Device /gpu: epoch #1 step=60 cost=1.300150
Device /cpu: epoch #2 step=40 cost=0.593681
Device /gpu: epoch #1 step=80 cost=1.242193
Device /cpu: epoch #2 step=60 cost=0.640543
Device /gpu: epoch #1 step=100 cost=1.105950
Device /gpu: epoch #2 step=0 cost=1.089900
Device /cpu: epoch #2 step=80 cost=0.664947
Device /gpu: epoch #2 step=20 cost=1.088389
Device /cpu: epoch #2 step=100 cost=0.535446
Device /cpu: epoch #3 step=0 cost=0.580295
Device /gpu: epoch #2 step=40 cost=0.983053
Device /cpu: epoch #3 step=20 cost=0.566510
Device /gpu: epoch #2 step=60 cost=1.044966
Device /cpu: epoch #3 step=40 cost=0.518787
Device /gpu: epoch #2 step=80 cost=1.025607
Device /cpu: epoch #3 step=60 cost=0.562461
Device /gpu: epoch #2 step=100 cost=0.897545
Device /gpu: epoch #3 step=0 cost=0.907381
Device /cpu: epoch #3 step=80 cost=0.600475
Device /gpu: epoch #3 step=20 cost=0.911914
Device /cpu: epoch #3 step=100 cost=0.477412
Device /cpu: epoch #4 step=0 cost=0.527233
Device /gpu: epoch #3 step=40 cost=0.827964
Device /cpu: epoch #4 step=20 cost=0.513356
Device /gpu: epoch #3 step=60 cost=0.897128
Device /cpu: epoch #4 step=40 cost=0.474257
Device /gpu: epoch #3 step=80 cost=0.898960
Device /cpu: epoch #4 step=60 cost=0.514083
Device /gpu: epoch #3 step=100 cost=0.774140
Device /gpu: epoch #4 step=0 cost=0.799004
Device /cpu: epoch #4 step=80 cost=0.559898
Device /gpu: epoch #4 step=20 cost=0.802869
Device /cpu: epoch #4 step=100 cost=0.440813
Device /gpu: epoch #4 step=40 cost=0.732562
Device /gpu: epoch #4 step=60 cost=0.801020
Device /gpu: epoch #4 step=80 cost=0.815830
Device /gpu: epoch #4 step=100 cost=0.692840
你可以自己试一试


如果有许多超参数需要调优,这并不理想,但您应该能够生成一个外部循环,在可能的超参数元组上进行迭代,为设备分配一个特定的图形,并按上图所示运行它们。

对于这样的问题,我通常使用多处理库而不是线程,因为与训练网络相比,多处理的开销很小,但消除了任何GIL问题。我认为这是代码的主要问题。您正在为每个线程设置“CUDA_VISIBLE_DEVICES”环境变量,但是每个线程仍然共享相同的环境,因为它们处于相同的进程中

因此,在Tensorflow==2.1中,我通常要做的是将GPU id号传递给工作进程,然后工作进程可以运行以下代码来设置可见的GPU

gpus = tf.config.experimental.list_physical_devices('GPU')
my_gpu = gpus[gpu_id]
tf.config.set_visible_devices(my_gpu, 'GPU')
该进程中的Tensorflow现在只在一个GPU上运行

有时,您正在训练的网络足够小,您实际上可以在一个GPU上同时运行多个网络。为了确保GPU内存中可以容纳多个工作线程,您可以为启动的每个工作线程设置内存限制

tf.config.set_logical_device_configuration(
    my_gpu,
    [tf.config.LogicalDeviceConfiguration(memory_limit=6000)]
)

但是如果您设置了内存限制,请记住Tensorflow会为cuDNN或其他东西使用超出限制的额外内存,因此您需要为运行的每个会话都有一点缓冲内存。通常我只是尝试一个错误,看看我能适应什么,所以很抱歉我没有更好的数字。

你试过这个吗-?谢谢。我的问题更多的是运行4个不同的进程,每个进程都在1个gpu上运行。本指南讨论多个GPU上的同一型号。我说的是不同的模型,具有不同的训练例程,每个都在不同的gpui上。我来看看你的解决方案,txs。我发现这个库可以做到这一点:
gpus = tf.config.experimental.list_physical_devices('GPU')
my_gpu = gpus[gpu_id]
tf.config.set_visible_devices(my_gpu, 'GPU')
tf.config.set_logical_device_configuration(
    my_gpu,
    [tf.config.LogicalDeviceConfiguration(memory_limit=6000)]
)