如何将decode_batch_predictions（）方法添加到Keras Captcha OCR模型中？_Keras_Ocr_Decoding_Ctc

如何将decode_batch_predictions（）方法添加到Keras Captcha OCR模型中？

keras

如何将decode_batch_predictions（）方法添加到Keras Captcha OCR模型中？,keras,ocr,decoding,ctc,Keras,Ocr,Decoding,Ctc,电流返回一个CTC编码的输出，在推断后需要解码要对其进行解码，需要在推理后作为单独的步骤运行解码实用程序函数 preds = prediction_model.predict(batch_images) pred_texts = decode_batch_predictions(preds) 解码实用程序函数使用keras.backend.ctc_decode，后者反过来使用贪婪或波束搜索解码器 # A utility function to decode the output of the

电流返回一个CTC编码的输出，在推断后需要解码

要对其进行解码，需要在推理后作为单独的步骤运行解码实用程序函数

preds = prediction_model.predict(batch_images)
pred_texts = decode_batch_predictions(preds)

解码实用程序函数使用

keras.backend.ctc_decode

，后者反过来使用贪婪或波束搜索解码器

# A utility function to decode the output of the network
def decode_batch_predictions(pred):
    input_len = np.ones(pred.shape[0]) * pred.shape[1]
    # Use greedy search. For complex tasks, you can use beam search
    results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0][
        :, :max_length
    ]
    # Iterate over the results and get back the text
    output_text = []
    for res in results:
        res = tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8")
        output_text.append(res)
    return output_text

我想使用Keras来训练Captcha-OCR模型，该模型返回解码后的CTC作为输出，而不需要在推断后执行额外的解码步骤

我怎样才能做到这一点呢？

你的问题可以用两种方式来解释。一个是：我想要一个神经网络来解决一个问题，其中CTC解码步骤已经在网络学习的内容中。另一个是，您希望有一个模型类在其中执行此操作，而不使用外部函数

我不知道第一个问题的答案。我甚至不知道这是否可行。在任何情况下，这听起来像是一个理论上的难题，如果你在这里运气不好，你可能想试着把它发布到一个更注重理论的社区

现在，如果你想解决的是问题的第二个工程版本，我可以帮你。该问题的解决方案如下：

您需要使用具有所需方法的类对keras.models.Model进行子类化。我浏览了您发布的链接中的教程，并随以下课程一起学习：

类修改模型（keras.models.Model）：
#对网络输出进行解码的实用函数
def解码批处理预测（自我、预测）：
input_len=np.one（pred.shape[0]）*pred.shape[1]
#使用贪婪搜索。对于复杂的任务，可以使用光束搜索
results=keras.backend.ctc\u decode（pred，input\u length=input\u len，greedy=True）[0][0][
：，：最大长度
]
#迭代结果并返回文本
输出文本=[]
对于结果中的res：
res=tf.strings.reduce_join（num_to_char（res））.numpy（）.decode（“utf-8”）
输出_text.append（res）
返回输出文本
def预测_文本（自身、批处理_图像）：
preds=self.predict（批处理图像）
返回自解码批处理预测（preds）

你可以给它取你想要的名字，只是为了举例说明。定义了此类后，您将替换该行

#通过提取层直到输出层得到预测模型
预测模型=keras.models.model(
model.get_层（name=“image”）.输入，model.get_层（name=“dense2”）.输出
)

与

prediction\u model=ModifiedModel(
model.get_层（name=“image”）.输入，model.get_层（name=“dense2”）.输出
)

然后你就可以换线了

preds=prediction\u model.predict（批处理图像）
pred_文本=解码批次预测（pred）

与

pred_text=预测_模型。预测_文本（批处理图像）

实现这一点最可靠的方法是添加一个方法，该方法被称为模型定义的一部分：

def CTCDecoder():
  def decoder(y_pred):
    input_shape = tf.keras.backend.shape(y_pred)
    input_length = tf.ones(shape=input_shape[0]) * tf.keras.backend.cast(
        input_shape[1], 'float32')
    unpadded = tf.keras.backend.ctc_decode(y_pred, input_length)[0][0]
    unpadded_shape = tf.keras.backend.shape(unpadded)
    padded = tf.pad(unpadded,
                    paddings=[[0, 0], [0, input_shape[1] - unpadded_shape[1]]],
                    constant_values=-1)
    return padded

return tf.keras.layers.Lambda(decoder, name='decode')

然后定义模型如下：

prediction_model = keras.models.Model(inputs=inputs, outputs=CTCDecoder()(model.output))

功劳归于我

此实现支持导出到TFLite，但仅支持float32。量化（int8）TFLite导出仍在抛出错误，这是TF团队的一个公开问题。

这是使用Keras进行导出的一个好方法。不幸的是，这无法导出到TFLite模型，因为它依赖于Keras预测方法。