Tensorflow 2多个模型在同一GPU上的速度提升_Tensorflow_Tensorflow2.0

Tensorflow 2多个模型在同一GPU上的速度提升

tensorflow

Tensorflow 2多个模型在同一GPU上的速度提升,tensorflow,tensorflow2.0,Tensorflow,Tensorflow2.0,我正在尝试使用一个小的ish数据集对LSTM进行超参数搜索。我已将用于超参数搜索的代码保存在此。如您所见，代码相当普通。我已经根据如下文档将内存增长设置为True： import tensorflow as tf physical_devices = tf.config.list_physical_devices('GPU') try: tf.config.experimental.set_memory_growth(physical_devices[0], True) except:

我正在尝试使用一个小的ish数据集对LSTM进行超参数搜索。我已将用于超参数搜索的代码保存在此。如您所见，代码相当普通。我已经根据如下文档将内存增长设置为

True

：

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
try:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
except:
    print('Could not set GPU memory growth')
    pass

LSTM本身非常普通

nn = input_layer = keras.layers.Input(shape=(200,2))

for l in layers:
    # layers is [{'units' : 16}]
    nn = keras.layers.LSTM(**l)(nn)

output_layer = keras.layers.Dense(output_dim, activation=output_activation)(nn)

model = keras.Model(input_layer, output_layer)
opt = keras.optimizers.Adam(lr=learning_rate)
model.compile(loss=loss, optimizer=opt)

如果我用一个进程运行这段代码，我会得到大约300us/sample。如果我用10个进程运行它，我会得到2ms/sample，这意味着我很难从并行运行中获得任何好处。我只是想知道其他人看到了哪些性能优势

我的规范：Ryzen3700X、64gb内存、RTX2060GB、Windows10、Python 3.7、Tensorflow 2.2。从任务管理器来看，CPU/内存使用率约为50%，GPU使用率显然只有约6-8%

[编辑]我忘了提到输入已经过预处理。因此，每个过程实际上只是创建模型，然后将数据输入GPU