Python 多次创建新的Tensorflow设备

Python 多次创建新的Tensorflow设备,python,c++,tensorflow,deep-learning,deeplab,Python,C++,Tensorflow,Deep Learning,Deeplab,我试图在服务器上运行tensorflow-deeplab-v3模型来分割我发送的图像。一切正常,但问题是每次我发送图像时,模型都会查找GPU并创建新的GPU设备,而创建设备的过程对于我发送的每个图像大约需要10秒。如何防止模型每次都创建设备,而只使用以前创建的设备 我试图设置CUDA_可视设备,但结果还是一样。我还尝试创建一个设备并用该设备运行我的代码,但同样的结果 我正在Amazon p2.xlarge EC2实例上运行我的服务器。操作系统信息为: Distributor ID: Ubuntu

我试图在服务器上运行tensorflow-deeplab-v3模型来分割我发送的图像。一切正常,但问题是每次我发送图像时,模型都会查找GPU并创建新的GPU设备,而创建设备的过程对于我发送的每个图像大约需要10秒。如何防止模型每次都创建设备,而只使用以前创建的设备

我试图设置CUDA_可视设备,但结果还是一样。我还尝试创建一个设备并用该设备运行我的代码,但同样的结果

我正在Amazon p2.xlarge EC2实例上运行我的服务器。操作系统信息为:

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:    16.04
Codename:   xenial
nvidia smi输出:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
nvcc--版本输出:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
python版本:3.5.2 pip版本:19.1.1 pip列表输出:

Package              Version        
-------------------- ---------------
absl-py              0.7.1          
astor                0.8.0          
bottle               0.12.16        
certifi              2019.3.9       
chardet              3.0.4          
cycler               0.10.0         
gast                 0.2.2          
get                  2019.4.13      
google-pasta         0.1.7          
grpcio               1.21.1         
h5py                 2.9.0          
idna                 2.8            
Keras-Applications   1.0.8          
Keras-Preprocessing  1.1.0          
kiwisolver           1.1.0          
Markdown             3.1.1          
matplotlib           3.0.3          
mock                 3.0.5          
numpy                1.16.4         
opencv-python        4.1.0.25       
Pillow               6.0.0          
pip                  19.1.1         
post                 2019.4.13      
protobuf             3.8.0          
public               2019.4.13      
pyparsing            2.4.0          
python-dateutil      2.8.0          
query-string         2019.4.13      
request              2019.4.13      
requests             2.22.0         
setuptools           41.0.1         
six                  1.12.0         
tb-nightly           1.14.0a20190614
tensorboard          1.14.0         
tensorflow-estimator 1.14.0         
tensorflow-gpu       1.14.0         
termcolor            1.1.0          
urllib3              1.25.3         
Werkzeug             0.15.4         
wheel                0.33.4         
wrapt                1.11.2 
第一个请求之后的请求的输出:

78.181.181.107 - - [23/Jun/2019 11:18:20] "GET / HTTP/1.1" 200 0
Request arrived.
...
Writing output masks...
2019-06-23 11:22:42.036040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.036423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:22:42.036502: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:22:42.036540: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:22:42.036572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:22:42.036604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:22:42.036637: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:22:42.036669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:22:42.036702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:22:42.036776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-23 11:22:42.037430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-23 11:22:42.037448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-06-23 11:22:42.037465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-06-23 11:22:42.037643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.038233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Preparing paths...
Paths ready. (2.3365020751953125e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Generated. (9.5367431640625e-07)
Prediction took: 11.09858751296997
Cropping /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Cropped and wrote to file. (0.06068730354309082)
Preparing paths...
Paths ready. (2.4557113647460938e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Generated. (0.0004572868347167969)
Prediction took: 0.47649669647216797
Cropping /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Cropped and wrote to file. (0.06105923652648926)
Collecting trashes...
All clear! (0.000209808349609375)
Evaluation complete. (11.765886068344116)
Measuring...
Measuring complete. (1.4767637252807617)
78.181.181.107 - - [23/Jun/2019 11:22:48] "GET / HTTP/1.1" 200 0
我将推断脚本嵌入到我自己用来运行服务器的脚本中,如下所示(这里我不从源代码加载图像以进行测试,脚本尚未完全完成)。它在第161行创建GPU设备,同时在zipped:'循环中输入'for pred_dict,image_path:

来自未来导入绝对导入
来自未来进口部
来自未来导入打印功能
导入时间
导入argparse
导入操作系统
导入glob
从io导入字节io
导入tensorflow作为tf
进口cv2
将DeepLab.tensorflow_DeepLab_v3_plus.DeepLab_模型导入为DeepLab_模型
来自DeepLab.tensorflow\u DeepLab\u v3\u plus.utils导入预处理
从DeepLab.tensorflow\u DeepLab\u v3\u plus.utils导入数据集\u util
从PIL导入图像
#将matplotlib.pyplot作为plt导入
从tensorflow.python导入调试为tf_调试
从瓶子导入运行、发布、请求、路线
导入请求
进口作物
进口措施
...
#使用Winograd非融合算法可以小幅度提高性能。
os.environ['TF\u ENABLE\u WINOGRAD\u nonflued']=“1”
pred_hooks=无
如果FLAGS.debug:
debug\u hook=tf\u debug.LocalCLIDebugHook()
pred_hook=[debug_hook]
打印(“搜索GPU…”)
开始=时间。时间()
GPU=tf.config.experimental.list\u物理\u设备('GPU'))
end=time.time()
打印(“找到所有GPU(“+str(结束-开始)+”))
打印(“生成模型…”)
开始=时间。时间()
模型=tf.estimator.estimator(
model_fn=deeplab_model.deeplabv3_plus_model_fn,
model_dir=FLAGS.model_dir,
params={
“输出步幅”:FLAGS.output\u步幅,
“批处理大小”:1,#批处理大小必须为1,因为图像的大小可能不同
“基本架构”:FLAGS.base\u架构,
“预培训模型”:无,
“批次标准衰减”:无,
‘num_classes’:_num_classes,
})
end=time.time()
打印(“模型就绪(“+str(结束-开始)+”))
#打印(“生成tensorflow会话…”)
#开始=时间。时间()
#config=tf.ConfigProto()
#sess=tf.Session(config=config)
#end=time.time()
#打印(“已创建会话。(“+str(结束-开始)+”))
def评估模型(图像列表目录、推断路径、数据路径、模型路径、模型输出路径):
打印(“准备列表…”)
开始=时间。时间()
#这一部分查看数据文件夹,并将其中所有文件的名称写入sample_images_list.txt
图像列表=打开(图像列表目录,“w”)
对于os.listdir(数据路径)中的文件:
imageList.write(str(文件)+“\n”)
imageList.close()
end=time.time()
打印(“生成的列表(“+str(结束-开始)+”)
打印(“加载图像…”)
开始=时间。时间()
#此部分为当前数据运行模型
examples=dataset\u util.read\u examples\u list(FLAGS.infere\u data\u list)
image\u files=[os.path.join(FLAGS.data\u dir,filename)作为示例中的文件名]
end=time.time()
打印(“加载的图像(“+str(结束-开始)+”)
使用tf.device(“/job:localhost/replica:0/task:0/device:GPU:0”):
打印(“内部设备”)
打印(“预测…”)
开始=时间。时间()
预测=model.predict(
input_fn=lambda:preprocessing.eval_input_fn(图像文件),
挂钩=前挂钩)
end=time.time()
打印(“预测完成。(“+str(结束-开始)+”))
output\u dir=FLAGS.output\u dir
如果操作系统路径不存在(输出目录):
os.makedirs(输出目录)
打印(“调用zip函数…”)
开始=时间。时间()
zipped=zip(预测、图像文件)
end=time.time()
打印(“Zip()完成(“+str(结束-开始)+”)
打印(“压缩:+str(压缩))
打印(“写入输出掩码…”)
predictionTimeStart=time.time()
对于pred_dict,压缩的图像路径:
#打印(“pred_dict is:+str(pred_dict))
打印(“准备路径…”)
开始=时间。时间()
image\u basename=os.path.splitext(os.path.basename(image\u path))[0]
输出\文件名=图像\基本名称+'\掩码.png'
path\u to\u output=os.path.join(output\u dir,output\u filename)
end=time.time()
打印(“路径就绪(“+str(结束-开始)+”))
打印(“生成:”,路径到输出)
开始=时间。时间()
mask=pred_dict['decoded_label']
end=time.time()
打印(“已生成(“+str(结束-开始)+”))
#使用此部件还可以保存掩码
#tmp=Image.fromarray(掩码)
#打印轴(“关闭”)
#plt.imshow(tmp)
#plt.savefig(路径到输出,bbox英寸=紧密)
predictionTimeEnd=time.time()
打印(“预测时间:”+str(predictionTimeEnd-predictionTimeStart))
打印(“裁剪”+路径到路径输出)
开始=时间。时间()
裁剪器。评估(路径到输出,cv2.CVT颜色(遮罩,cv2.COLOR\U BGR2GRAY))
end=time.time()
打印(“剪切并写入文件。(“+str(结束-开始)+”)
predictionTimeStart=time.time()
打印(“收集垃圾…”)
开始=时间。时间()
在g中归档