Python Dask能否自动创建一棵树来并行化计算并减少工人之间的副本？_Python_Parallel Processing_Dask_Dask Distributed_Dask Delayed

Python Dask能否自动创建一棵树来并行化计算并减少工人之间的副本？

python parallel-processing dask

Python Dask能否自动创建一棵树来并行化计算并减少工人之间的副本？,python,parallel-processing,dask,dask-distributed,dask-delayed,Python,Parallel Processing,Dask,Dask Distributed,Dask Delayed,我把它分为两个部分，背景和问题。问题一直在底部背景：假设我想（使用Dask分布式）做一个令人尴尬的并行计算，比如对16个巨大的数据帧求和。我知道使用CUDA这将是非常快的，但是对于这个例子，让我们继续使用Dask 实现这一点的基本方法（使用延迟）是：这是dask图：这当然会起作用，但随着矩阵的大小（参见上面的gen_矩阵）变得太大，Dask分布式工作者开始出现三个问题：它们向执行求和的主要工作人员发送数据时超时主辅助进程在收集所有矩阵时内存不足总总和不是并行运行的（只有矩阵G）

我把它分为两个部分，背景和问题。问题一直在底部

背景：

假设我想（使用Dask分布式）做一个令人尴尬的并行计算，比如对16个巨大的数据帧求和。我知道使用CUDA这将是非常快的，但是对于这个例子，让我们继续使用Dask

实现这一点的基本方法（使用延迟）是：

这是dask图：

这当然会起作用，但随着矩阵的大小（参见上面的gen_矩阵）变得太大，Dask分布式工作者开始出现三个问题：

它们向执行求和的主要工作人员发送数据时超时

主辅助进程在收集所有矩阵时内存不足

总总和不是并行运行的（只有矩阵G）

请注意，这些问题都不是Dask的错，它是按照宣传的那样工作的。我刚把计算设置得很糟糕

一种解决方案是将其分解为一个树计算，如图所示，以及该图形的dask可视化：

from functools import reduce
import math
from dask import delayed, compute, visualize
import dask.distributed as dd
import numpy as np

@delayed
def gen_matrix():
    return np.random.rand(1000, 1000)

@delayed
def calc_sum(a, b):
    return a + b

if __name__ == '__main__':

    num_matrices = 16

    # Plop them into a big list
    matrices = [gen_matrix() for _ in range(num_matrices)]

    # This tells us the depth of the calculation portion
    # of the tree we are constructing in the next step
    depth = int(math.log(num_matrices, 2))

    # This is the code I don't want to have to manually write
    for _ in range(depth):
        matrices = [
            calc_sum(matrices[i], matrices[i+1])
            for i in range(0, len(matrices), 2)
        ]

    # Go!
    with dd.Client('localhost:8786') as client:
        f = client.submit(compute, matrices)
        result = client.gather(f)

图中：

问题:

我希望能够通过库或者Dask本身来完成这一树的生成。我怎样才能做到这一点

对于那些想知道的人，为什么不直接使用上面的代码呢？因为我不想为一些边缘案例编写代码，也因为需要编写的代码更多：）

我也看到了这一点：

functools或itertools中是否有什么东西知道如何做到这一点（并且可以与dask.delayed一起使用）？

dask bag有一种还原/聚合方法，可以生成树状DAG:

工作流程是“打包”延迟的对象，然后折叠它们

from functools import reduce
import math
from dask import delayed, compute, visualize
import dask.distributed as dd
import numpy as np

@delayed
def gen_matrix():
    return np.random.rand(1000, 1000)

@delayed
def calc_sum(a, b):
    return a + b

if __name__ == '__main__':

    num_matrices = 16

    # Plop them into a big list
    matrices = [gen_matrix() for _ in range(num_matrices)]

    # This tells us the depth of the calculation portion
    # of the tree we are constructing in the next step
    depth = int(math.log(num_matrices, 2))

    # This is the code I don't want to have to manually write
    for _ in range(depth):
        matrices = [
            calc_sum(matrices[i], matrices[i+1])
            for i in range(0, len(matrices), 2)
        ]

    # Go!
    with dd.Client('localhost:8786') as client:
        f = client.submit(compute, matrices)
        result = client.gather(f)