Python 多次创建新的Tensorflow设备_Python_C++_Tensorflow_Deep Learning_Deeplab

Python 多次创建新的Tensorflow设备

python c++ tensorflow deep-learning

Python 多次创建新的Tensorflow设备,python,c++,tensorflow,deep-learning,deeplab,Python,C++,Tensorflow,Deep Learning,Deeplab,我试图在服务器上运行tensorflow-deeplab-v3模型来分割我发送的图像。一切正常，但问题是每次我发送图像时，模型都会查找GPU并创建新的GPU设备，而创建设备的过程对于我发送的每个图像大约需要10秒。如何防止模型每次都创建设备，而只使用以前创建的设备我试图设置CUDA_可视设备，但结果还是一样。我还尝试创建一个设备并用该设备运行我的代码，但同样的结果我正在Amazon p2.xlarge EC2实例上运行我的服务器。操作系统信息为： Distributor ID: Ubuntu

我试图在服务器上运行tensorflow-deeplab-v3模型来分割我发送的图像。一切正常，但问题是每次我发送图像时，模型都会查找GPU并创建新的GPU设备，而创建设备的过程对于我发送的每个图像大约需要10秒。如何防止模型每次都创建设备，而只使用以前创建的设备

我试图设置CUDA_可视设备，但结果还是一样。我还尝试创建一个设备并用该设备运行我的代码，但同样的结果

我正在Amazon p2.xlarge EC2实例上运行我的服务器。操作系统信息为：

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:    16.04
Codename:   xenial

nvidia smi输出：

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc--版本输出：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

python版本：3.5.2 pip版本：19.1.1 pip列表输出：

Package              Version        
-------------------- ---------------
absl-py              0.7.1          
astor                0.8.0          
bottle               0.12.16        
certifi              2019.3.9       
chardet              3.0.4          
cycler               0.10.0         
gast                 0.2.2          
get                  2019.4.13      
google-pasta         0.1.7          
grpcio               1.21.1         
h5py                 2.9.0          
idna                 2.8            
Keras-Applications   1.0.8          
Keras-Preprocessing  1.1.0          
kiwisolver           1.1.0          
Markdown             3.1.1          
matplotlib           3.0.3          
mock                 3.0.5          
numpy                1.16.4         
opencv-python        4.1.0.25       
Pillow               6.0.0          
pip                  19.1.1         
post                 2019.4.13      
protobuf             3.8.0          
public               2019.4.13      
pyparsing            2.4.0          
python-dateutil      2.8.0          
query-string         2019.4.13      
request              2019.4.13      
requests             2.22.0         
setuptools           41.0.1         
six                  1.12.0         
tb-nightly           1.14.0a20190614
tensorboard          1.14.0         
tensorflow-estimator 1.14.0         
tensorflow-gpu       1.14.0         
termcolor            1.1.0          
urllib3              1.25.3         
Werkzeug             0.15.4         
wheel                0.33.4         
wrapt                1.11.2

第一个请求之后的请求的输出：

78.181.181.107 - - [23/Jun/2019 11:18:20] "GET / HTTP/1.1" 200 0
Request arrived.
...
Writing output masks...
2019-06-23 11:22:42.036040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.036423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:22:42.036502: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:22:42.036540: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:22:42.036572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:22:42.036604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:22:42.036637: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:22:42.036669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:22:42.036702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:22:42.036776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-23 11:22:42.037430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-23 11:22:42.037448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-06-23 11:22:42.037465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-06-23 11:22:42.037643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.038233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Preparing paths...
Paths ready. (2.3365020751953125e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Generated. (9.5367431640625e-07)
Prediction took: 11.09858751296997
Cropping /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Cropped and wrote to file. (0.06068730354309082)
Preparing paths...
Paths ready. (2.4557113647460938e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Generated. (0.0004572868347167969)
Prediction took: 0.47649669647216797
Cropping /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Cropped and wrote to file. (0.06105923652648926)
Collecting trashes...
All clear! (0.000209808349609375)
Evaluation complete. (11.765886068344116)
Measuring...
Measuring complete. (1.4767637252807617)
78.181.181.107 - - [23/Jun/2019 11:22:48] "GET / HTTP/1.1" 200 0

我将推断脚本嵌入到我自己用来运行服务器的脚本中，如下所示（这里我不从源代码加载图像以进行测试，脚本尚未完全完成）。它在第161行创建GPU设备，同时在zipped:'循环中输入'for pred_dict，image_path:

来自未来导入绝对导入
来自未来进口部
来自未来导入打印功能
导入时间
导入argparse
导入操作系统
导入glob
从io导入字节io
导入tensorflow作为tf
进口cv2
将DeepLab.tensorflow_DeepLab_v3_plus.DeepLab_模型导入为DeepLab_模型
来自DeepLab.tensorflow\u DeepLab\u v3\u plus.utils导入预处理
从DeepLab.tensorflow\u DeepLab\u v3\u plus.utils导入数据集\u util
从PIL导入图像
#将matplotlib.pyplot作为plt导入
从tensorflow.python导入调试为tf_调试
从瓶子导入运行、发布、请求、路线
导入请求
进口作物
进口措施
...
#使用Winograd非融合算法可以小幅度提高性能。
os.environ['TF\u ENABLE\u WINOGRAD\u nonflued']=“1”
pred_hooks=无
如果FLAGS.debug：
debug\u hook=tf\u debug.LocalCLIDebugHook（）
pred_hook=[debug_hook]
打印（“搜索GPU…”）
开始=时间。时间（）
GPU=tf.config.experimental.list\u物理\u设备（'GPU'））
end=time.time（）
打印（“找到所有GPU（“+str（结束-开始）+”））
打印（“生成模型…”）
开始=时间。时间（）
模型=tf.estimator.estimator(
model_fn=deeplab_model.deeplabv3_plus_model_fn，
model_dir=FLAGS.model_dir，
params={
“输出步幅”：FLAGS.output\u步幅，
“批处理大小”：1，#批处理大小必须为1，因为图像的大小可能不同
“基本架构”：FLAGS.base\u架构，
“预培训模型”：无，
“批次标准衰减”：无，
‘num_classes’：_num_classes，
})
end=time.time（）
打印（“模型就绪（“+str（结束-开始）+”））
#打印（“生成tensorflow会话…”）
#开始=时间。时间（）
#config=tf.ConfigProto（）
#sess=tf.Session（config=config）
#end=time.time（）
#打印（“已创建会话。（“+str（结束-开始）+”））
def评估模型（图像列表目录、推断路径、数据路径、模型路径、模型输出路径）：
打印（“准备列表…”）
开始=时间。时间（）
#这一部分查看数据文件夹，并将其中所有文件的名称写入sample_images_list.txt
图像列表=打开（图像列表目录，“w”）
对于os.listdir（数据路径）中的文件：
imageList.write（str（文件）+“\n”）
imageList.close（）
end=time.time（）
打印（“生成的列表（“+str（结束-开始）+”）
打印（“加载图像…”）
开始=时间。时间（）
#此部分为当前数据运行模型
examples=dataset\u util.read\u examples\u list（FLAGS.infere\u data\u list）
image\u files=[os.path.join（FLAGS.data\u dir，filename）作为示例中的文件名]
end=time.time（）
打印（“加载的图像（“+str（结束-开始）+”）
使用tf.device（“/job:localhost/replica:0/task:0/device:GPU:0”）：
打印（“内部设备”）
打印（“预测…”）
开始=时间。时间（）
预测=model.predict(
input_fn=lambda:preprocessing.eval_input_fn（图像文件），
挂钩=前挂钩）
end=time.time（）
打印（“预测完成。（“+str（结束-开始）+”））
output\u dir=FLAGS.output\u dir
如果操作系统路径不存在（输出目录）：
os.makedirs（输出目录）
打印（“调用zip函数…”）
开始=时间。时间（）
zipped=zip（预测、图像文件）
end=time.time（）
打印（“Zip（）完成（“+str（结束-开始）+”）
打印（“压缩：+str（压缩））
打印（“写入输出掩码…”）
predictionTimeStart=time.time（）
对于pred_dict，压缩的图像路径：
#打印（“pred_dict is:+str（pred_dict））
打印（“准备路径…”）
开始=时间。时间（）
image\u basename=os.path.splitext（os.path.basename（image\u path））[0]
输出\文件名=图像\基本名称+'\掩码.png'
path\u to\u output=os.path.join（output\u dir，output\u filename）
end=time.time（）
打印（“路径就绪（“+str（结束-开始）+”））
打印（“生成：”，路径到输出）
开始=时间。时间（）
mask=pred_dict['decoded_label']
end=time.time（）
打印（“已生成（“+str（结束-开始）+”））
#使用此部件还可以保存掩码
#tmp=Image.fromarray（掩码）
#打印轴（“关闭”）
#plt.imshow（tmp）
#plt.savefig（路径到输出，bbox英寸=紧密）
predictionTimeEnd=time.time（）
打印（“预测时间：”+str（predictionTimeEnd-predictionTimeStart））
打印（“裁剪”+路径到路径输出）
开始=时间。时间（）
裁剪器。评估（路径到输出，cv2.CVT颜色（遮罩，cv2.COLOR\U BGR2GRAY））
end=time.time（）
打印（“剪切并写入文件。（“+str（结束-开始）+”）
predictionTimeStart=time.time（）
打印（“收集垃圾…”）
开始=时间。时间（）
在g中归档