Python 为什么即使批处理大小为1，我也会出现内存分配错误？_Python_Tensorflow_Machine Learning_Keras_Image Segmentation

Python 为什么即使批处理大小为1，我也会出现内存分配错误？

python tensorflow machine-learning keras

Python 为什么即使批处理大小为1，我也会出现内存分配错误？,python,tensorflow,machine-learning,keras,image-segmentation,Python,Tensorflow,Machine Learning,Keras,Image Segmentation,我（仍然）尝试在Tensorflow 2.0后端使用Keras实现一个简单的Unet网络我的模板和遮罩是1536x1536 RGB图像（遮罩是黑白的）。根据，可以测量所需的内存量我的模型因张量[1,1615361536]上的内存分配错误而崩溃。使用上面文章中给出的等式，我计算了这个张量所需的内存量：1*16*1536*1536*4=144 MB。我有GTX 1080 Ti，可用于Tensorflow，容量约为9 GB。怎么了？我错过什么了吗这里有一个几乎完整的回溯： 2020-03-02

我（仍然）尝试在Tensorflow 2.0后端使用Keras实现一个简单的Unet网络

我的模板和遮罩是1536x1536 RGB图像（遮罩是黑白的）。根据，可以测量所需的内存量

我的模型因张量[1,1615361536]上的内存分配错误而崩溃。使用上面文章中给出的等式，我计算了这个张量所需的内存量：1*16*1536*1536*4=144 MB。我有GTX 1080 Ti，可用于Tensorflow，容量约为9 GB。怎么了？我错过什么了吗

这里有一个几乎完整的回溯：

2020-03-02 15:59:13.841967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-03-02 15:59:16.083234: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-03-02 15:59:16.087240: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-03-02 15:59:16.210856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 15:59:16.210988: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 15:59:16.211429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 15:59:16.947775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 15:59:16.947868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-03-02 15:59:16.947922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-03-02 15:59:16.948594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
2020-03-02 15:59:16.994676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 15:59:16.994849: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 15:59:16.995291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 15:59:16.995793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 15:59:16.995908: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 15:59:16.996301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 15:59:16.996406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 15:59:16.996491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-03-02 15:59:16.996541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-03-02 15:59:16.996942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
2020-03-02 15:59:18.191834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 15:59:18.191964: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 15:59:18.192383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 15:59:18.192499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 15:59:18.192591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-03-02 15:59:18.192644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-03-02 15:59:18.193053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
Epoch 1/100
2020-03-02 15:59:18.421211: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-02 15:59:19.577897: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 512.00M (536870912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.616600: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 460.80M (483183872 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.638395: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-03-02 15:59:19.644478: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.644601: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
2020-03-02 15:59:19.653644: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.653767: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 259.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-03-02 15:59:19.865828: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.874844: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:29.884662: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:29.893593: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:29.893792: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 144.00MiB (rounded to 150994944).  Current allocation summary follows.
2020-03-02 15:59:29.919126: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 1054574080 memory_limit_: 9210949796 available bytes: 8156375716 curr_region_allocation_bytes_: 1073741824
2020-03-02 15:59:29.919304: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: 
Limit:                  9210949796
InUse:                  1010432000
MaxInUse:               1010432000
NumAllocs:                     594
MaxAllocSize:            283870720

2020-03-02 15:59:29.919520: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *****__****************xxxxxxxxxx***************xxxxxxxxxx******************************xxxxxxxxxxxx
2020-03-02 15:59:29.919696: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at conv_ops.cc:947 : Resource exhausted: OOM when allocating tensor with shape[1,16,1536,1536] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "E:/Explorium/python/unet_trainer.py", line 82, in <module>
    results = model.fit_generator(train_generator, epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, validation_data=val_generator, validation_steps=VALIDATION_STEPS, callbacks=callbacks)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1297, in fit_generator
    steps_name='steps_per_epoch')
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 973, in train_on_batch
    class_weight=class_weight, reset_metrics=reset_metrics)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 264, in train_on_batch
    output_loss_metrics=model._output_loss_metrics)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 311, in train_on_batch
    output_loss_metrics=output_loss_metrics))
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 252, in _process_single_batch
    training=training))
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 127, in _model_loss
    outs = model(inputs, **kwargs)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 708, in call
    convert_kwargs_to_constants=base_layer_utils.call_context().saving)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 860, in _run_internal_graph
    output_tensors = layer(computed_tensors, **kwargs)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py", line 197, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 1134, in __call__
    return self.conv_op(inp, filter)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 639, in __call__
    return self.call(inp, filter)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 238, in __call__
    name=self.name)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 2010, in conv2d
    name=name)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1031, in conv2d
    data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1130, in conv2d_eager_fallback
    ctx=_ctx, name=name)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,16,1536,1536] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Conv2D]

Process finished with exit code 1

当然，一个张量可能会占用那么多内存，但您还必须保存网络中的所有变量，以及要反向传播的值。这使得计算操作需求变得复杂（尽管并非不可能）。您的网络的操作空间相当大。

您的问题在于图像的尺寸

与评论中的其他人不同，这不是模型的维度，而是需要更多GPU内存才能处理的图像的输入维度

在您的情况下，解决方案是使用因子2对图像进行下采样。您需要使用完全相同的因子划分宽度和高度，以保持纵横比，从而允许网络即使在较小的图像上也能学习，而不会丢失太多信息和引入失真

您将能够在768x768上的GTX 1080上进行批量为1的训练（我有一台GTX 1080Ti，并且我测试了几个具有多个输入维度的分段网络）。如果由于某些原因，您的GPU消耗被其他进程（如YT或类似进程）消耗，那么将其减少到512x512肯定会起作用（即使批量为768x768，它也会起作用）

您的型号太大，无法装入GPU。尝试打印摘要并检查参数的数量，所有这些参数都必须在内存中存储大约两次。内存还必须包含所有中间激活。你的网络中并不是只有一个张量（例如，特别是在训练中，你也需要在内存中保留中间激活），它会在试图专门分配该张量时崩溃，但这并不意味着剩余内存是免费的。BlackBear说，参数应该合适，但激活大小取决于图像分辨率（例如，卷积的参数不是）。@BlackBear这是我的模型。summary（）结果：总参数：2164593可训练参数：2161649不可训练参数：2944可能图像分辨率太高。降低分辨率会有帮助吗？您的图像分辨率太高了。对不起，网络太大了。其他的评论是误导性的，问题出在神经网络的输入维度，而不是基本的U型网络模型。当然，这是一个更好的说法。编辑了这篇文章。我能够使用768x768分辨率以4的批量大小训练模型。是的，我能够在更大的网络上以2的批量大小训练1024x512，这是意料之中的。

import numpy as np
import os
import cv2
import random
from tensorflow.python.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Input, BatchNormalization, Activation, Dropout
from tensorflow.python.keras.layers.convolutional import Conv2D, Conv2DTranspose
from tensorflow.python.keras.layers.pooling import MaxPooling2D
from tensorflow.python.keras.layers.merge import concatenate
import tensorflow as tf


config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)


def data_gen(templates_folder, masks_folder, image_width, batch_size):
    counter = 0
    images_list = os.listdir(templates_folder)
    random.shuffle(images_list)
    while True:
        templates_pack = np.zeros((batch_size, image_width, image_width, 3)).astype('float')
        masks_pack = np.zeros((batch_size, image_width, image_width, 1)).astype('float')
        for i in range(counter, counter + batch_size):
            template = cv2.imread(templates_folder + '/' + images_list[i]) / 255.
            templates_pack[i - counter] = template

            mask = cv2.imread(masks_folder + '/' + images_list[i], cv2.IMREAD_GRAYSCALE) / 255.
            mask = mask.reshape(image_width, image_width, 1) # Add extra dimension for parity with template size [1536 * 1536 * 3]
            masks_pack[i - counter] = mask

        counter += batch_size
        if counter + batch_size >= len(images_list):
            counter = 0
            random.shuffle(images_list)
        yield templates_pack, masks_pack


def get_unet(input_image, n_filters, kernel_size, dropout=0.5):
    conv_1 = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), data_format="channels_last", activation='relu', kernel_initializer="he_normal", padding="same")(input_image)
    conv_1 = BatchNormalization()(conv_1)
    conv_2 = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_1)
    conv_2 = BatchNormalization()(conv_2)
    pool_1 = MaxPooling2D(pool_size=(2, 2))(conv_2)
    pool_1 = Dropout(dropout * 0.5)(pool_1)

    conv_3 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_1)
    conv_3 = BatchNormalization()(conv_3)
    conv_4 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_3)
    conv_4 = BatchNormalization()(conv_4)
    pool_2 = MaxPooling2D(pool_size=(2, 2))(conv_4)
    pool_2 = Dropout(dropout)(pool_2)

    conv_5 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_2)
    conv_5 = BatchNormalization()(conv_5)
    conv_6 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_5)
    conv_6 = BatchNormalization()(conv_6)
    pool_3 = MaxPooling2D(pool_size=(2, 2))(conv_6)
    pool_3 = Dropout(dropout)(pool_3)

    conv_7 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_3)
    conv_7 = BatchNormalization()(conv_7)
    conv_8 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_7)
    conv_8 = BatchNormalization()(conv_8)
    pool_4 = MaxPooling2D(pool_size=(2, 2))(conv_8)
    pool_4 = Dropout(dropout)(pool_4)

    conv_9 = Conv2D(filters=n_filters * 16, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_4)
    conv_9 = BatchNormalization()(conv_9)
    conv_10 = Conv2D(filters=n_filters * 16, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_9)
    conv_10 = BatchNormalization()(conv_10)

    upconv_1 = Conv2DTranspose(n_filters * 8, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_10)
    concat_1 = concatenate([upconv_1, conv_8])
    concat_1 = Dropout(dropout)(concat_1)
    conv_11 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_1)
    conv_11 = BatchNormalization()(conv_11)
    conv_12 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_11)
    conv_12 = BatchNormalization()(conv_12)

    upconv_2 = Conv2DTranspose(n_filters * 4, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_12)
    concat_2 = concatenate([upconv_2, conv_6])
    concat_2 = Dropout(dropout)(concat_2)
    conv_13 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_2)
    conv_13 = BatchNormalization()(conv_13)
    conv_14 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_13)
    conv_14 = BatchNormalization()(conv_14)

    upconv_3 = Conv2DTranspose(n_filters * 2, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_14)
    concat_3 = concatenate([upconv_3, conv_4])
    concat_3 = Dropout(dropout)(concat_3)
    conv_15 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_3)
    conv_15 = BatchNormalization()(conv_15)
    conv_16 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_15)
    conv_16 = BatchNormalization()(conv_16)

    upconv_4 = Conv2DTranspose(n_filters * 1, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_16)
    concat_4 = concatenate([upconv_4, conv_2])
    concat_4 = Dropout(dropout)(concat_4)
    conv_17 = Conv2D(filters=n_filters * 1, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_4)
    conv_17 = BatchNormalization()(conv_17)
    conv_18 = Conv2D(filters=n_filters * 1, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_17)
    conv_18 = BatchNormalization()(conv_18)

    conv_19 = Conv2D(1, (1, 1), activation='sigmoid')(conv_18)
    model = Model(inputs=input_image, outputs=conv_19)
    return model


callbacks = [EarlyStopping(patience=10, verbose=1),
             ReduceLROnPlateau(factor=0.1, patience=3, min_lr=0.00001, verbose=1),
             ModelCheckpoint("model-prototype.h5", verbose=1, save_best_only=True, save_weights_only=True)
             ]
train_templates_path = "E:/train/templates"
train_masks_path = "E:/train/masks"
valid_templates_path = "E:/valid/templates"
valid_masks_path = "E:/valid/masks"
TRAIN_SET_SIZE = len(os.listdir(train_templates_path))
VALID_SET_SIZE = len(os.listdir(valid_templates_path))
BATCH_SIZE = 1
EPOCHS = 100
STEPS_PER_EPOCH = TRAIN_SET_SIZE / BATCH_SIZE
VALIDATION_STEPS = VALID_SET_SIZE / BATCH_SIZE
IMAGE_WIDTH = 1536

train_generator = data_gen(train_templates_path, train_masks_path, IMAGE_WIDTH, batch_size = BATCH_SIZE)
val_generator = data_gen(valid_templates_path, valid_masks_path, IMAGE_WIDTH, batch_size = BATCH_SIZE)

input_image = Input((IMAGE_WIDTH, IMAGE_WIDTH, 3), name='img')
model = get_unet(input_image, n_filters=16, kernel_size = 3, dropout=0.05)

model.compile(optimizer=Adam(lr=0.001), loss="binary_crossentropy", metrics=["accuracy"])

results = model.fit_generator(train_generator, epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, validation_data=val_generator, validation_steps=VALIDATION_STEPS, callbacks=callbacks)