Python 为什么使用tf.keras 75x进行推理比使用TFLite慢？_Python_Tensorflow_Tf.keras_Tensorflow Lite

Python 为什么使用tf.keras 75x进行推理比使用TFLite慢？

python tensorflow

Python 为什么使用tf.keras 75x进行推理比使用TFLite慢？,python,tensorflow,tf.keras,tensorflow-lite,Python,Tensorflow,Tf.keras,Tensorflow Lite,我使用一个简单的CNN运行一个代码，对音频数据进行一些预测使用tf.keras.Model.predict时，平均执行时间为0.17s，使用tf.lite.Interpreter时，平均执行时间为0.002s，大约快75倍！我在我的桌面（Ubuntu18.04，TF2.1）和Rapsberry Pi 3B+（Raspbian Buster，相同的代码）上试过，得到了相同的区别为什么差别这么大更新：我在tf.keras.Model.predict中设置了batch_size=1，现在比TFL

我使用一个简单的CNN运行一个代码，对音频数据进行一些预测

使用

tf.keras.Model.predict

时，平均执行时间为0.17s，使用tf.lite.Interpreter时，平均执行时间为0.002s，大约快75倍！我在我的桌面（Ubuntu18.04，TF2.1）和Rapsberry Pi 3B+（Raspbian Buster，相同的代码）上试过，得到了相同的区别

为什么差别这么大

更新：我在tf.keras.Model.predict中设置了

batch_size=1

，现在比TFLite慢65倍

测试\u tflite.py

import os
import pathlib
import tensorflow as tf
from tensorflow.keras.models import model_from_json
import numpy as np
import time


# disable GPU
tf.config.set_visible_devices([], 'GPU')


parent = pathlib.Path(__file__).parent.absolute()

# path to Tensorflow model and weights
MODEL_PATH = os.path.join(parent, 'models/vd_model.json')
WEIGHTS_PATH = os.path.join(parent, 'models/model.30-0.97.h5')
INPUT_SHAPE = (1, 43, 40, 1)

NUM_RUN = 100


def predict_tflite(interpreter, input_details, output_details, data):
    interpreter.set_tensor(input_details[0]['index'], data)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])
    return output_data


def run():

    # Load Tensorflow model
    with open(MODEL_PATH, 'r') as f:
        model = model_from_json(f.read())
    model.load_weights(WEIGHTS_PATH)

    # Show model
    model.summary()

    # Convert to TFLite
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    tflite_model = converter.convert()
    interpreter = tf.lite.Interpreter(model_content=tflite_model)
    interpreter.allocate_tensors() 
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    predictions = []
    for i in range(NUM_RUN):

        # fake input data
        data = np.random.rand(*INPUT_SHAPE).astype(np.float32)

        # Tensorflow
        start_time = time.time()
        prediction = model.predict(data, batch_size=1)
        elapsed = time.time() - start_time

        # Tensoflow Lite
        start_time = time.time()
        prediction_tflite = predict_tflite(interpreter, input_details, output_details, data)
        elapsed_tflite = time.time() - start_time

        predictions.append(((elapsed, prediction), (elapsed_tflite, prediction_tflite)))

    # Make sure predictions are close
    for pred_tf, pred_tflite in predictions:
        if not np.all(np.isclose(pred_tf[1], pred_tflite[1])):
              print('Predictions are not close')

    # Compute average execution times
    tf_avg = np.mean([p[0] for p, _ in predictions])
    tflite_avg = np.mean([p[0] for _, p in predictions])

    print(f'TF: {tf_avg:.6f}')
    print(f'TFLite: {tflite_avg:.6f}')


if __name__ == "__main__":
    run()

执行（树莓皮）：

pi@raspberrypi:~/src/audio_monitoring/audio_monitoring/tests $ python3 test_tflite.py 
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 43, 40, 16)        160       
_________________________________________________________________
batch_normalization (BatchNo (None, 43, 40, 16)        64        
_________________________________________________________________
activation (Activation)      (None, 43, 40, 16)        0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 22, 20, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 22, 20, 32)        4640      
_________________________________________________________________
batch_normalization_1 (Batch (None, 22, 20, 32)        128       
_________________________________________________________________
activation_1 (Activation)    (None, 22, 20, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 1, 1, 32)          0         
_________________________________________________________________
dropout (Dropout)            (None, 1, 1, 32)          0         
_________________________________________________________________
flatten (Flatten)            (None, 32)                0         
_________________________________________________________________
dense (Dense)                (None, 4)                 132       
=================================================================
Total params: 5,124
Trainable params: 5,028
Non-trainable params: 96
_________________________________________________________________

TF average prediction time: 0.168310s
TFLite average prediction time: 0.002269s

这种性能差异背后可能有很多原因，但概括起来：

在TFLite模型转换时，应用了一些图形优化（常数折叠、op融合等）
在转换时，静态执行计划提前确定
即使对于CPU，TFLite也常常为特定的CPU体系结构（例如，ARM上的NEON）提供优化的内核实现

也就是说，并非所有TensorFlow模型都可以转换为TFLite，因为TFLite只支持TensorFlow支持的ops的子集

我想你会发现这个科技讲座很有趣。请看一看

型号（x）

可能更快更稳定；看见