为什么Tensorflow服务运行的模型比Keras股票模型运行的Tensorflow慢_Tensorflow_Keras_Tensorflow Serving_Nvidia Docker

为什么Tensorflow服务运行的模型比Keras股票模型运行的Tensorflow慢

tensorflow keras

为什么Tensorflow服务运行的模型比Keras股票模型运行的Tensorflow慢,tensorflow,keras,tensorflow-serving,nvidia-docker,Tensorflow,Keras,Tensorflow Serving,Nvidia Docker,我一直在尝试在嵌入式设备上部署一个使用Tensorflow的机器学习解决方案（Jetson Xavier[ARMv8]）该解决方案使用的一个模型是股票例外网络，由以下内容生成： xception = tf.keras.applications.Xception(include_top=False, input_shape=(299, 299, 3), pooling=None xception.save("./saved_xception_model/1", save_format="tf")

我一直在尝试在嵌入式设备上部署一个使用Tensorflow的机器学习解决方案（Jetson Xavier[ARMv8]）

该解决方案使用的一个模型是股票例外网络，由以下内容生成：

xception = tf.keras.applications.Xception(include_top=False, input_shape=(299, 299, 3), pooling=None
xception.save("./saved_xception_model/1", save_format="tf")

在设备上运行异常模型会产生合理的性能-预测大约0.1s，忽略所有处理：

xception = tf.keras.models.load_model("saved_xception_model/1", save_format="tf")
image = get_some_image() # image is numpy.ndarray
image.astype("float32")
image /= 255
image = cv2.resize(image, (299, 299))
# Tensorflow predict takes ~0.1s
xception.predict([image])

然而，一旦该模型通过Nvidia Docker在Tensorflow服务GPU容器中运行，该模型就会慢得多——大约3秒的预测时间

我一直试图找出业绩不佳的原因，但我已经没有办法了

到目前为止，我已经测试过：

调整TF Serving的批处理参数，以全力解决延迟问题（

batch\u timeout\u micros:0

，

max\u batch\u size:1

，并注意到性能适度提高0.5秒

通过

saved\u model\u cli

使用TensorRT优化模型

单独运行异常模型，作为TF Serving服务的唯一模型

尝试将每个TF进程分配的内存增加一倍

尝试完全启用和禁用批处理

正在尝试启用和禁用模型预热

我希望TF服务提供与TF相同的（或多或少，考虑到GRPC编码和解码）预测时间，我正在运行的其他模型也是如此。我的任何努力都没有超出我预期的~0.1s性能

我安装的Tensorflow是由Nvidia从TF 2.0版构建的。我的TF服务容器是从TF服务2.0源代码自建的，支持GPU

我启动Tensorflow服务容器，如下所示：

tf_serving_cmd = "docker run --runtime=nvidia -d"
tf_serving_cmd += " --name my-container"
tf_serving_cmd += " -p=8500:8500 -p=8501:8501"
tf_serving_cmd += " --mount=type=bind,source=/home/xception_model,target=/models/xception_model"
tf_serving_cmd += " --mount=type=bind,source=/home/model_config.pb,target=/models/model_config.pb"
tf_serving_cmd += " --mount=type=bind,source=/home/batching_config.pb,target=/models/batching_config.pb"

# Self built TF serving image for Jetson Xavier, ARMv8.
tf_serving_cmd += " ${MY_ORG}/serving" 
# I have tried 0.5 with no performance difference. 

# TF-Serving does not complain it wants more memory in either case.
tf_serving_cmd += " --per_process_gpu_memory_fraction:0.25"
tf_serving_cmd += " --model_config_file=/models/model_config.pb"
tf_serving_cmd += " --flush_filesystem_caches=true"
tf_serving_cmd += " --enable_model_warmup=true"
tf_serving_cmd += " --enable_batching=true"
tf_serving_cmd += " --batching_parameters_file=/models/batching_config.pb"

我开始怀疑这是否是TF服务中的一个bug，尽管我不知道在哪里（是的，我知道这从来都不是bug，总是用户…）

有人能提出为什么TF服务可能比TF表现差吗？

我知道这不是一个答案，但你能分享你的Dockerfile（创建TF服务容器）吗？我现在甚至无法在tx2上实现这一点，但很可能以后也会解决相同的性能问题。@omartin2010不幸的是，我们的DockerFile是有利的，所以我不能随意共享它们。您需要先编译一个Bazel容器，将您的映像基于Linux for Tegra容器，调用

nvidia docker build…

要公开主机库，请将所有缺失的头文件和库从内置文件中破解到

/etc/nvidia container runtime/host files for dontainer.d/

（尝试和错误，一次一个…），调整您的交换文件大小，Bazel构建资源以避免内存不足问题，然后调整所有丢失和未添加的Dockerfile apt和pip依赖项。尝试和错误..:/@omartin2010花了我们两周时间，还有1000行shell脚本。很抱歉，我不能帮您更多。编辑：哦，是的，您还必须更改所有CUDA和TensorRT变量来匹配您主机上安装的任何版本。现在我正在构建我自己的tf服务，基本上是一个flask应用程序。我的案例对于证明真正有tf服务的合理性并不特别重要，但无论如何，感谢您的答案，如果我遵循这条路线，它可能会有所帮助！正如@omartin2010上面提到的，很难调试这是一个问题，因为我无法复制它。如果你找到一个最小的例子，请分享它。