Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Tensorflow在不同GPU上的执行和内存_Python_Tensorflow - Fatal编程技术网

Python Tensorflow在不同GPU上的执行和内存

Python Tensorflow在不同GPU上的执行和内存,python,tensorflow,Python,Tensorflow,有时,当我使用单个GPU运行TensorFlow时,但在多个GPU设置中,代码将在一个GPU上执行,但在另一个GPU上分配内存。这显然会导致经济大幅放缓 作为示例,请参见以下nvidia smi的结果。这里,我的一个同事正在使用GPU 0和1(进程32918和33112),我使用以下命令启动TensorFlow(在导入TensorFlow之前) 其中,我的三个进程的gpu_id分别为2、3和4。正如我们所看到的,内存在GPU 2、3和4上分配正确,但代码在其他地方执行!在这种情况下,在GPU 0

有时,当我使用单个GPU运行TensorFlow时,但在多个GPU设置中,代码将在一个GPU上执行,但在另一个GPU上分配内存。这显然会导致经济大幅放缓

作为示例,请参见以下nvidia smi的结果。这里,我的一个同事正在使用GPU 0和1(进程32918和33112),我使用以下命令启动TensorFlow(在导入TensorFlow之前)

其中,我的三个进程的gpu_id分别为2、3和4。正如我们所看到的,内存在GPU 2、3和4上分配正确,但代码在其他地方执行!在这种情况下,在GPU 0、1和7上

Wed May 17 17:04:01 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:04:00.0     Off |                    0 |
| N/A   41C    P0    75W / 149W |    278MiB / 11439MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 0000:05:00.0     Off |                    0 |
| N/A   36C    P0    89W / 149W |    278MiB / 11439MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 0000:08:00.0     Off |                    0 |
| N/A   61C    P0    58W / 149W |   6265MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 0000:09:00.0     Off |                    0 |
| N/A   42C    P0    70W / 149W |   8313MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 0000:84:00.0     Off |                    0 |
| N/A   51C    P0    55W / 149W |   8311MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 0000:85:00.0     Off |                    0 |
| N/A   29C    P0    68W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 0000:88:00.0     Off |                    0 |
| N/A   31C    P0    54W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 0000:89:00.0     Off |                    0 |
| N/A   27C    P0    68W / 149W |      0MiB / 11439MiB |     33%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     32918    C   python                                         274MiB |
|    1     33112    C   python                                         274MiB |
|    2     34891    C   ...sadl/anaconda3/envs/tensorflow/bin/python  6259MiB |
|    3     34989    C   ...sadl/anaconda3/envs/tensorflow/bin/python  8309MiB |
|    4     35075    C   ...sadl/anaconda3/envs/tensorflow/bin/python  8307MiB |
+-----------------------------------------------------------------------------+
由于某种原因,tensorflow似乎部分忽略了“CUDA\u可视设备”选项

我没有在代码中使用任何设备放置命令

这是运行在Ubuntu16.04上的TensorFlow 1.1的体验,我经历了一系列不同的场景


是否存在可能发生这种情况的已知场景?如果是这样,我能做些什么吗?

可能的原因之一是“nvidia smi”

nvidia smi顺序与GPU ID不同

“建议希望一致性的用户使用UUDI或PCI总线ID,因为设备枚举顺序不能保证一致性”

“faster_FIRST使CUDA使用一个简单的启发式方法猜测哪个设备最快,并将该设备设为0,其余设备的顺序未指定。PCI_总线ID按PCI总线ID升序排列设备。”

请看这里:

这里也讨论了:

我解决了这个问题

问题似乎与nvidia smi有关,而不是与tensorflow有关。如果您通过
sudo nvidia smi-pm 1
在GPU上启用持久化模式,则会显示正确的状态,例如:

Fri May 19 15:28:06 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:04:00.0     Off |                    0 |
| N/A   60C    P0   143W / 149W |   6263MiB / 11439MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:05:00.0     Off |                    0 |
| N/A   46C    P0   136W / 149W |   8311MiB / 11439MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:08:00.0     Off |                    0 |
| N/A   64C    P0   110W / 149W |   8311MiB / 11439MiB |     67%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:09:00.0     Off |                    0 |
| N/A   48C    P0   142W / 149W |   8311MiB / 11439MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           On   | 0000:84:00.0     Off |                    0 |
| N/A   32C    P8    27W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           On   | 0000:85:00.0     Off |                    0 |
| N/A   26C    P8    28W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           On   | 0000:88:00.0     Off |                    0 |
| N/A   28C    P8    26W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           On   | 0000:89:00.0     Off |                    0 |
| N/A   25C    P8    28W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     42840    C   ...sadl/anaconda3/envs/tensorflow/bin/python  6259MiB |
|    1     42878    C   ...sadl/anaconda3/envs/tensorflow/bin/python  8307MiB |
|    2     43264    C   ...sadl/anaconda3/envs/tensorflow/bin/python  8307MiB |
|    3      4721    C   python                                        8307MiB |
+-----------------------------------------------------------------------------+

感谢您在解决此问题时提供的帮助。

感谢您提供的信息,但这如何解释分配似乎正确,但执行却不正确的事实呢?是的,我对此也有点困惑。显示屏告诉您,GPU 7上没有进程分配内存,但GPU 7的利用率却高达33%。它在做谁的工作?这是一个从命令行运行的服务器,所以甚至没有显示器连接到机器上。这可能是一些驱动程序错误,但我不太确定它将如何发生。
Fri May 19 15:28:06 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:04:00.0     Off |                    0 |
| N/A   60C    P0   143W / 149W |   6263MiB / 11439MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:05:00.0     Off |                    0 |
| N/A   46C    P0   136W / 149W |   8311MiB / 11439MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:08:00.0     Off |                    0 |
| N/A   64C    P0   110W / 149W |   8311MiB / 11439MiB |     67%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:09:00.0     Off |                    0 |
| N/A   48C    P0   142W / 149W |   8311MiB / 11439MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           On   | 0000:84:00.0     Off |                    0 |
| N/A   32C    P8    27W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           On   | 0000:85:00.0     Off |                    0 |
| N/A   26C    P8    28W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           On   | 0000:88:00.0     Off |                    0 |
| N/A   28C    P8    26W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           On   | 0000:89:00.0     Off |                    0 |
| N/A   25C    P8    28W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     42840    C   ...sadl/anaconda3/envs/tensorflow/bin/python  6259MiB |
|    1     42878    C   ...sadl/anaconda3/envs/tensorflow/bin/python  8307MiB |
|    2     43264    C   ...sadl/anaconda3/envs/tensorflow/bin/python  8307MiB |
|    3      4721    C   python                                        8307MiB |
+-----------------------------------------------------------------------------+