在Keras模型中增加CTC损耗和CTC解码

在Keras模型中增加CTC损耗和CTC解码,keras,conv-neural-network,lstm,handwriting-recognition,ctc,Keras,Conv Neural Network,Lstm,Handwriting Recognition,Ctc,我试图解决手写文本识别的一个用例。我用CNN和LSTM创建了一个网络。需要将其输出馈送到CTC层。我可以在原生tensorflow中找到一些代码来实现这一点。在Keras中是否有更简单的选择 model = Sequential() model.add(Conv2D(64, kernel_size=(5,5),activation = 'relu', input_shape=(128,32,1), padding='same', data_format='channels_last')) mod

我试图解决手写文本识别的一个用例。我用CNN和LSTM创建了一个网络。需要将其输出馈送到CTC层。我可以在原生tensorflow中找到一些代码来实现这一点。在Keras中是否有更简单的选择

model = Sequential()
model.add(Conv2D(64, kernel_size=(5,5),activation = 'relu', input_shape=(128,32,1), padding='same', data_format='channels_last'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(128, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,1)))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Lambda(lambda x: x[:, :, 0, :], output_shape=(None,31,512), mask=None, arguments=None))
#model.add(Bidirectional(LSTM(256, return_sequences=True), input_shape=(31, 256)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Dense(75, activation = 'softmax'))


任何关于如何轻松添加CTC损耗和解码层的帮助都将非常有用

CTC损耗函数需要四个参数来计算损耗、预测输出、地面真值标签、LSTM的输入序列长度和地面真值标签长度。为了得到这个结果,我们需要创建一个自定义损失函数,然后将其传递给模型。为了使它与您定义的模型兼容,我们需要创建一个模型,该模型接受这四个输入并输出损耗。此模型将用于培训和测试,可以使用您先前创建的模型

让我们创建一个您以不同方式使用的keras模型,以便我们可以创建该模型的两个不同版本,以便在培训和测试时使用

# input with shape of height=32 and width=128 
inputs = Input(shape=(32, 128, 1))

# convolution layer with kernel size (3,3)
conv_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
# poolig layer with kernel size (2,2)
pool_1 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_1)

conv_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool_1)
pool_2 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_2)

conv_3 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_2)

conv_4 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv_3)
# poolig layer with kernel size (2,1)
pool_4 = MaxPool2D(pool_size=(2, 1))(conv_4)

conv_5 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool_4)
# Batch normalization layer
batch_norm_5 = BatchNormalization()(conv_5)

conv_6 = Conv2D(512, (3, 3), activation='relu', padding='same')(batch_norm_5)
batch_norm_6 = BatchNormalization()(conv_6)
pool_6 = MaxPool2D(pool_size=(2, 1))(batch_norm_6)

conv_7 = Conv2D(512, (2, 2), activation='relu')(pool_6)

squeezed = Lambda(lambda x: K.squeeze(x, 1))(conv_7)

# bidirectional LSTM layers with units=128
blstm_1 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(squeezed)
blstm_2 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(blstm_1)

outputs = Dense(len(char_list) + 1, activation='softmax')(blstm_2)

# model to be used at test time
test_model = Model(inputs, outputs)
我们将在培训期间使用ctc功能。因此,让我们实现ctc_loss_函数,并使用ctc_loss_函数创建一个培训模型:

labels = Input(name='the_labels', shape=[max_label_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')


def ctc_lambda_func(args):
    y_pred, labels, input_length, label_length = args

    return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
  

loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([outputs, labels, 
input_length, label_length])

#model to be used at training time
training_model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
-->训练此模型并将权重保存在.h5文件中

现在使用测试模型,并通过使用参数_name=True加载训练模型的已保存权重,这样它将只加载匹配层的权重