Tensorflow 是否可以将两个图形卡的内存添加到一起以运行更大的神经网络?

Tensorflow 是否可以将两个图形卡的内存添加到一起以运行更大的神经网络?,tensorflow,keras,neural-network,nvidia,unity3d-unet,Tensorflow,Keras,Neural Network,Nvidia,Unity3d Unet,如果我有一个24 gb内存的显卡,我可以添加第二个完全相同的显卡,将我的内存加倍到48 gb吗 我想运行一个大的3D UNet,但由于我要通过的卷的大小,我被停止了。添加第二张卡可以让我做更大的音量吗 **更新:我在Linux(Red Hat Enterprise Linux 8)上运行。我的代码可以在两个GPU上进行训练 **代码更新: def get_model(optimizer, loss_metric, metrics, lr=1e-3): inputs = Input((sa

如果我有一个24 gb内存的显卡,我可以添加第二个完全相同的显卡,将我的内存加倍到48 gb吗

我想运行一个大的3D UNet,但由于我要通过的卷的大小,我被停止了。添加第二张卡可以让我做更大的音量吗

**更新:我在Linux(Red Hat Enterprise Linux 8)上运行。我的代码可以在两个GPU上进行训练

**代码更新:

def get_model(optimizer, loss_metric, metrics, lr=1e-3):
    inputs = Input((sample_width, sample_height, sample_depth, 1))
    with tf.device('/device:gpu:0'): 
        conv1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(inputs)
        conv1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv1)
        pool1 = MaxPooling3D(pool_size=(2, 2, 2))(conv1)
        drop1 = Dropout(0.5)(pool1)
        conv2 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(drop1)
        conv2 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv2)
        pool2 = MaxPooling3D(pool_size=(2, 2, 2))(conv2)
        drop2 = Dropout(0.5)(pool2)
        conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(drop2)
        conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv3)
        pool3 = MaxPooling3D(pool_size=(2, 2, 2))(conv3)
        drop3 = Dropout(0.3)(pool3)
        conv4 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(drop3)
        conv4 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(conv4)
        pool4 = MaxPooling3D(pool_size=(2, 2, 2))(conv4)
        drop4 = Dropout(0.3)(pool4)
        conv5 = Conv3D(512, (3, 3, 3), activation='relu', padding='same')(drop4)
        conv5 = Conv3D(512, (3, 3, 3), activation='relu', padding='same')(conv5)
    with tf.device('/device:gpu:1'):
        up6 = concatenate([Conv3DTranspose(256, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv5), conv4], axis=4)
        conv6 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(up6)
        conv6 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(conv6)
        up7 = concatenate([Conv3DTranspose(128, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv6), conv3], axis=4)
        conv7 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(up7)
        conv7 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv7)
        up8 = concatenate([Conv3DTranspose(64, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv7), conv2], axis=4)
        conv8 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(up8)
        conv8 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv8)
        up9 = concatenate([Conv3DTranspose(32, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv8), conv1], axis=4)
        conv9 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(up9)
        conv9 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv9)
        conv10 = Conv3D(1, (1, 1, 1), activation='sigmoid')(conv9)
    model = Model(inputs=[inputs], outputs=[conv10])    
    model.compile(optimizer=optimizer(lr=lr), loss=loss_metric, metrics=metrics)    
    return model


model = get_model(optimizer=Adam, loss_metric=dice_coef_loss, metrics=[dice_coef], lr=1e-3)
model_checkpoint = ModelCheckpoint('save.model', monitor=observe_var, save_best_only=False, period = 1000)
model.fit(train_x, train_y, batch_size = 1, epochs= 2000, verbose=1, shuffle=True, validation_split=0.2, callbacks=[model_checkpoint])
model.save('final_save.model')

简而言之,答案是肯定的,但实际上,这取决于您使用的代表您访问内存的软件。我对这些操作系统知之甚少,但我相信Cuda可能是一个开始寻找的地方

我认为目前不可能将多个GPU组合在一起,用组合内存创建单个抽象GPU。但是,您可以做类似的事情:跨多个GPU拆分一个模型,这样仍然可以运行比任何单个GPU内存都大的模型

问题是,这样做需要手动指定模型的哪些部分将在每个设备上运行,这可能很难有效地完成。我也不知道如何才能做到这一点与预制模型

一般守则如下:

with tf.device('/gpu:0'):
    # create half the model

with tf.device('/gpu:1'):
    # create the other half of the model

# combine the two halves
更多阅读:


我尝试使用tf.device()添加
但仍然得到ResourceExhausterRor。我试图在gpu:0上放置UNet代码的向下部分,在gpu:1上放置UNet代码的向上部分。我把我的代码添加到原来的帖子中。我是否需要在模型的末尾添加一些东西来合并这两个部分,还是它会自动完成?@RicardoZaragoza我没有任何拆分模型的实际经验,所以我不确定。您确定该模型足够小,每一半都可以安装在每个GPU中吗?用谷歌四处搜索,看看其他人在做什么,例如