使用dask访问重影块

使用dask访问重影块,dask,Dask,使用dask,我想将图像数组分解为重叠的块,执行计算(同时对所有块执行),然后将结果缝合回图像中 以下方法可行,但感觉笨拙: from dask import array as da from dask.array import ghost import numpy as np test_data = np.random.random((50, 50)) x = da.from_array(test_data, chunks=(10, 10)) depth = {0: 1, 1: 1}

使用dask,我想将图像数组分解为重叠的块,执行计算(同时对所有块执行),然后将结果缝合回图像中

以下方法可行,但感觉笨拙:

from dask import array as da
from dask.array import ghost

import numpy as np


test_data = np.random.random((50, 50))
x = da.from_array(test_data, chunks=(10, 10))

depth = {0: 1, 1: 1}
g = ghost.ghost(x, depth=depth, boundary='reflect')

# Calculate the shape of the array in terms of chunks
chunk_shape = [len(c) for c in g.chunks]
chunk_nr = np.prod(chunk_shape)

# Allocate a list for results (as many entries as there are chunks)
blocks = [None,] * chunk_nr

def pack_block(block, block_id):
    """Store `block` at the correct position in `blocks`,
    according to its `block_id`.

    E.g., with ``block_id == (0, 3)``, the block will be stored at
    ``blocks[3]`.
    """
    idx = np.ravel_multi_index(block_id, chunk_shape)
    blocks[idx] = block

    # We don't really need to return anything, but this will do
    return block

g.map_blocks(pack_block).compute()

# Do some operation on the blocks; this is an over-simplified example.
# Typically, I want to do an operation that considers *all*
# blocks simultaneously, hence the need to first unpack into a list.
blocks = [b**2 for b in blocks]

def retrieve_block(_, block_id):
    """Fetch the correct block from the results set, `blocks`.
    """
    idx = np.ravel_multi_index(block_id, chunk_shape)
    return blocks[idx]

result = g.map_blocks(retrieve_block)

# Slice off excess from each computed chunk
result = ghost.trim_internal(result, depth)
result = result.compute()

有没有更干净的方法来实现相同的最终结果?

此方法面向用户的api是

针对您的用例的另外两个注释

  • 通过将name=False提供给from_数组来避免哈希开销。假设您周围没有任何花哨的散列库,这将为您节省大约400MB/s

    x = da.from_array(x, name=False)
    
  • 当心就地计算。如果用户函数改变了数据,Dask不能保证正确的行为。在这个特殊的情况下,这可能是好的,因为我们无论如何都是为了重影而复制,但这是需要注意的

  • 第二个答案 根据@stefan van der walt的评论,我们将尝试另一种解决方案

    考虑使用该方法获取作为对象的块数组

    这将为您提供一个dask.delayed对象的numpy数组,每个对象都指向一个块。现在可以对这些块执行任意并行计算。如果我希望它们都达到相同的功能,那么我可以调用以下命令:

    result = dask.delayed(f)(blocks.tolist())
    

    然后,函数
    f
    将获得numpy数组列表,每个数组对应于dask中的一个块。数组
    g

    是否可以添加一些上下文或如何需要解包列表的示例?也许不用说,但是当前的计算可以像ghost.trim_internal(g.map_blocks(lambda b:b**2),depth)那样进行。我想我应该更清楚地提到“同时在所有瓷砖上”。我真的需要一个所有阵列的列表,对它们进行操作,并将它们重新打包。但我可以看出,这可能是对dask的滥用,因为它并不是真正的目的
    depth = {0: 1, 1: 1}
    g = ghost.ghost(x, depth=depth, boundary='reflect')
    blocks = g.todelayed()
    
    result = dask.delayed(f)(blocks.tolist())