Flask 预测时,烧瓶app继续加载(TensorRT)
这是问题的继续 以上是解决方案,但当我运行flask“应用程序”时,它会继续加载而不显示视频 代码:Flask 预测时,烧瓶app继续加载(TensorRT),flask,tensorrt,nvidia-jetson,nvidia-jetson-nano,Flask,Tensorrt,Nvidia Jetson,Nvidia Jetson Nano,这是问题的继续 以上是解决方案,但当我运行flask“应用程序”时,它会继续加载而不显示视频 代码: 您的worker\u线程创建do\u推理所需的上下文。您应该在callback()中调用do\u推断方法 为什么不在回调中执行推理?这不意味着我将为每个请求创建上下文吗? def callback(): cuda.init() device = cuda.Device(0) ctx = device.make_context() onnx_model_path = './som
您的
worker\u线程
创建do\u推理所需的上下文
。您应该在callback()中调用do\u推断
方法
为什么不在回调中执行推理?这不意味着我将为每个请求创建上下文吗?
def callback():
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
onnx_model_path = './some.onnx'
fp16_mode = False
int8_mode = False
trt_engine_path = './model_fp16_{}_int8_{}.trt'.format(fp16_mode, int8_mode)
max_batch_size = 1
engine = get_engine(max_batch_size, onnx_model_path, trt_engine_path, fp16_mode, int8_mode)
context = engine.create_execution_context()
inputs, outputs, bindings, stream = allocate_buffers(engine)
ctx.pop()
##callback function ends
worker_thread = threading.Thread(target=callback())
worker_thread.start()
trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
print("start in do_inferece")
# Transfer data from CPU to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
print("before run infernce in do_inferece")
context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
print("before output in do_inferece")
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
print("before stream synchronize in do_inferece")
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
print("before return in do_inferece")
return [out.host for out in outputs]
def callback():
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
onnx_model_path = './some.onnx'
fp16_mode = False
int8_mode = False
trt_engine_path = './model_fp16_{}_int8_{}.trt'.format(fp16_mode, int8_mode)
max_batch_size = 1
engine = get_engine(max_batch_size, onnx_model_path, trt_engine_path, fp16_mode, int8_mode)
context = engine.create_execution_context()
inputs, outputs, bindings, stream = allocate_buffers(engine)
trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
# post-process the trt_outputs
ctx.pop()