Tensorflow 训练后量化后潜伏期未减少_Tensorflow_Machine Learning_Deep Learning_Computer Vision_Tensorflow2.0

Tensorflow 训练后量化后潜伏期未减少

tensorflow machine-learning deep-learning computer-vision

Tensorflow 训练后量化后潜伏期未减少,tensorflow,machine-learning,deep-learning,computer-vision,tensorflow2.0,Tensorflow,Machine Learning,Deep Learning,Computer Vision,Tensorflow2.0,我正在使用高效的网络对图像进行分类。我已经成功地训练了模型，并希望使用tf lite对其进行量化。我尝试了tf lite量化中可用的所有方法来检查准确性、大小和延迟。根据文档，大小减少了4倍，准确性几乎没有变化，但我的问题是延迟大幅增加，但根据文档，延迟也应该减少我正在使用google colab，在tf版本2.3.0的CPU和gpu模式下都进行了尝试。在原始模型中为测试图像运行推断大约10秒，但对于相同的测试图像（测试集上大约193个图像），为量化模型运行推断需要300秒。尝试了不同批次的测

我正在使用高效的网络对图像进行分类。我已经成功地训练了模型，并希望使用tf lite对其进行量化。我尝试了tf lite量化中可用的所有方法来检查准确性、大小和延迟。根据文档，大小减少了4倍，准确性几乎没有变化，但我的问题是延迟大幅增加，但根据文档，延迟也应该减少

我正在使用google colab，在tf版本2.3.0的CPU和gpu模式下都进行了尝试。在原始模型中为测试图像运行推断大约10秒，但对于相同的测试图像（测试集上大约193个图像），为量化模型运行推断需要300秒。尝试了不同批次的测试数据，加载数据的时间约为5秒

这是我的密码：

def repr_gen_data():
  a = []
  for image_path in test_image_list[:100]:
    image = cv2.imread(image_path,cv2.IMREAD_COLOR)
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    image = cv2.resize(image, (IMG_SIZE,IMG_SIZE))
    image_ = image.astype(np.float32)
    # image_ = tf.convert_to_tensor(image, dtype=tf.uint8)

    a.append(image_)
  a = np.array(a) 
  for i in tf.data.Dataset.from_tensor_slices(a).batch(1).take(100):
    yield [i] 

converter = tf.lite.TFLiteConverter.from_saved_model(model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = tf.lite.RepresentativeDataset(rep_data_gen)

converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

tflite_model = converter.convert()


import pathlib
tflite_models_dir = pathlib.Path(quantized_dir)
tflite_models_dir.mkdir(exist_ok=True, parents=True)
tflite_model_file = tflite_models_dir/"model_fullInt_cpu_2_v03.tflite"
tflite_model_file.write_bytes(tflite_model)

推断：测试数据从生成器获取

test_batch = test_data.cache().batch(1).prefetch(buffer_size=tf.data.experimental.AUTOTUNE) interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file)) input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() prediction = [] test_labels = [] interpreter.allocate_tensors() start_time = time.time() for img, label in test_batch.take(193): interpreter.set_tensor(input_details[0]['index'], img) interpreter.invoke() test_pred = tf.argmax(interpreter.get_tensor(output_details[0]['index']), axis=1) # output = interpreter.tensor(output_details) # print(test_pred) # break prediction.extend(test_pred) test_labels.extend(label) print('---%s--sec--'%(time.time()-start_time))
我无法追踪可能的问题，我在tensorflow的repo中讨论了github的问题，他们提到这只是针对移动cpu进行了优化，是这样吗？但是医生没有提到类似的事情。我感觉我的数据加载程序正在增加延迟，但检查了加载不同批次的测试数据大约需要5秒
参考：

-