Dask：如何对工作资源使用延迟函数？_Dask_Dask Distributed_Dask Delayed

Dask：如何对工作资源使用延迟函数？

dask

Dask：如何对工作资源使用延迟函数？,dask,dask-distributed,dask-delayed,Dask,Dask Distributed,Dask Delayed,我想做一个Dask延迟流，其中包括CPU和GPU任务。GPU任务只能在GPU工作线程上运行，并且GPU工作线程只有一个GPU，并且一次只能处理一个GPU任务不幸的是，我看不到在延迟API中指定工作资源的方法以下是常见代码： client = Client(resources={'GPU': 1}) @delayed def fcpu(x, y): sleep(1) return x + y @delayed def fgpu(x, y): sleep(1)

我想做一个Dask延迟流，其中包括CPU和GPU任务。GPU任务只能在GPU工作线程上运行，并且GPU工作线程只有一个GPU，并且一次只能处理一个GPU任务

不幸的是，我看不到在延迟API中指定工作资源的方法

以下是常见代码：

client = Client(resources={'GPU': 1})

@delayed
def fcpu(x, y):
    sleep(1)
    return x + y

@delayed
def fgpu(x, y):
    sleep(1)
    return x + y

下面是用纯延迟格式编写的流程。此代码将无法正常运行，因为它不知道GPU资源

# STEP ONE: two parallel CPU tasks
a = fcpu(1, 1)
b = fcpu(10, 10)

# STEP TWO: two GPU tasks
c = fgpu(a, b)  # Requires 1 GPU
d = fgpu(a, b)  # Requires 1 GPU

# STEP THREE: final CPU task
e = fcpu(c, d)

%time e.compute()  # 3 seconds

这是我能想出的最好的解决办法。它将延迟语法与Client.compute（）结合在一起。它的行为似乎正确，但它非常丑陋

# STEP ONE: two parallel CPU tasks
a = fcpu(1, 1)
b = fcpu(10, 10)
a_future, b_future = client.compute([a, b]) # Wo DON'T want a resource limit

# STEP TWO: two GPU tasks - only resources to run one at a time
c = fgpu(a_future, b_future)
d = fgpu(a_future, b_future)
c_future, d_future = client.compute([c, d], resources={'GPU': 1})

# STEP THREE: final CPU task
e = fcpu(c_future, d_future)
res = e.compute()

有更好的方法吗？

可能有一种类似于本文中描述的方法，即在一台GPU机器或带有SSD的机器上进行处理

def step_1_w_single_GPU(data):
    return "Step 1 done for: %s" % data


def step_2_w_local_IO(data):
    return "Step 2 done for: %s" % data


stage_1 = [delayed(step_1_w_single_GPU)(i) for i in range(10)]
stage_2 = [delayed(step_2_w_local_IO)(s2) for s2 in stage_1]

result_stage_2 = client.compute(stage_2,
                                resources={tuple(stage_1): {'GPU': 1},
                                           tuple(stage_2): {'ssdGB': 100}})