使用Dask时,tensorflow只能从安装了2个GPU的机器检测1个GPU
我们的HPC节点有2个K80 GPU。当我使用python在HPC节点上运行以下代码时,代码将检测2个gpu并显示“gpu设备类型:['TeslaK80','TeslaK80']” 但当我用DASK运行相同的代码时,它只能检测1个GPU。它显示“gpu设备类型:['TeslaK80']” 以下是检测GPU的代码 导入tensorflow作为tf使用Dask时,tensorflow只能从安装了2个GPU的机器检测1个GPU,dask,Dask,我们的HPC节点有2个K80 GPU。当我使用python在HPC节点上运行以下代码时,代码将检测2个gpu并显示“gpu设备类型:['TeslaK80','TeslaK80']” 但当我用DASK运行相同的代码时,它只能检测1个GPU。它显示“gpu设备类型:['TeslaK80']” 以下是检测GPU的代码 导入tensorflow作为tf def init_gpu() print("\n\n\n ... tensorflow version = ", tf.__version__)
def init_gpu()
print("\n\n\n ... tensorflow version = ", tf.__version__)
from tensorflow.python.client import device_lib
local_device_protos = device_lib.list_local_devices()
print("local device protos:{0}".format(local_device_protos))
_gpu_raw_info = [(x.name,x.physical_device_desc) for x in local_device_protos if x.device_type == 'GPU']
print("gpu raw info:{0}".format(_gpu_raw_info))
_gpu_names = [x[0] for x in _gpu_raw_info]
_gpu_devices = [x[1] for x in _gpu_raw_info]
_gpu_device_types = [x.split(':')[2].split(',')[0].replace(' ','') for x in _gpu_devices]
print("gpu device types:{0}".format(_gpu_device_types))
以下是在群集上启动作业的DASK LSF群集代码:
cluster = LSFCluster(queue=queue_name, project=hpc_project, alltime='80:00', cores=1, processes=1, local_directory='dask-worker-space', memory='250GB', job_extra=['-gpu "num=2"'], log_directory='scheduler_log', dashboard_address=':8787'))
cluster.scale(1* 1)
client = Client(cluster.scheduler_address, timeout=60)
wbsd_results = []
r = dask.delayed(init_gpu)()
wbsd_results.append(r)
client.compute(wbsd, sync=True)
请帮忙。谢谢。您能否确认,当您在没有dask的情况下运行LSF作业时,作业可以同时看到两个GPU?有一些例子,这可能是有用的如果我使用LSF作业没有DASK,我得到了1个GPU以及。它必须是LSF设置。谢谢