在Python多处理池中使用Tensorflow/keras_Tensorflow_Multiprocessing

在Python多处理池中使用Tensorflow/keras

tensorflow

在Python多处理池中使用Tensorflow/keras,tensorflow,multiprocessing,Tensorflow,Multiprocessing,我想在Tensorflow/Keras中进行神经网络培训，但更喜欢使用python多处理模块来最大限度地利用系统资源和节省时间。我所做的就是这样（我想在没有GPU或有一个或多个GPU的系统上运行此代码）：导入。。。（必选模块）来自多处理导入池导入tensorflow作为tf config=tf.ConfigProto（） config.gpu\u options.allow\u growth=True sess=tf.Session（config=config） tf.keras.back

我想在Tensorflow/Keras中进行神经网络培训，但更喜欢使用python多处理模块来最大限度地利用系统资源和节省时间。我所做的就是这样（我想在没有GPU或有一个或多个GPU的系统上运行此代码）：

导入。。。（必选模块）
来自多处理导入池
导入tensorflow作为tf
config=tf.ConfigProto（）
config.gpu\u options.allow\u growth=True
sess=tf.Session（config=config）
tf.keras.backend.set_会话（sess）
... 一些tf和非tf变量初始化。。。
... 一些函数可以帮助以TFRecord格式读取tensorflow数据集。。。
... 函数定义keras模型。。。
#主要工人职能
def doWork（参数）：
从tensorflow.keras.callbacks导入EarlyStoping、ModelCheckpoint
从tensorflow.keras.models导入负载_模型
列车数据=读取数据集（…）
val_数据=读取数据集（…）
测试数据=读取数据集（…）
如果（NumGPUs>1）：
strategy=tf.distribute.MirroredStrategy（）
使用strategy.scope（）：
模型=keras_模型（…）
model.compile（…）
其他：
模型=keras_模型（…）
model.compile（…）
模型拟合（列车数据，历元=历元，每历元的步数=列车步数，…）
_，test_acc=model.evaluate（测试数据，步骤=测试步骤）
…记录结果。。。
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
池=池（进程=2）
a1=
a1=
游泳池地图（陶工（a1、a2））

我可以在不同的计算机上运行这段代码并得到结果，但有时我会遇到系统挂起（特别是如果我想通过按CTRL+C中止执行）或程序终止时出现不同的错误，我想上面所述并不是结合Tensorflow/Keras和Python多处理的正确方式。编写上述代码的正确方式是什么？

我认为它已经优化，可以使用尽可能多的cpu。我认为它已经优化，可以使用尽可能多的cpu

import ... (required modules)
from multiprocessing import Pool
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
tf.keras.backend.set_session(sess)

... some tf and non-tf variable initializations...
... some functions to facilitate reading tensorflow datasets in TFRecord format...
... function defining keras model...

# Main worker function
def doWork(args):
    from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
    from tensorflow.keras.models import load_model

    train_data = read_datasets(...)
    val_data = read_datasets(...)
    test_data = read_datasets(...)

    if (NumGPUs > 1):
        strategy = tf.distribute.MirroredStrategy()
        with strategy.scope():
            model = keras_model(...)
            model.compile(...)
    else:
        model = keras_model(...)
        model.compile(...)

    model.fit(train_data, epochs=epochs, steps_per_epoch=train_steps, ...)
    _, test_acc = model.evaluate(test_data, steps=test_steps)
    ...log results...

if __name__ == '__main__':
    pool = Pool(processes=2)
    a1 = <set of parameters for the first run>
    a1 = <set of parameters for the second run>
    pool.map(doWork, (a1, a2))