Python 3.x Dask map_块-索引器:元组索引超出范围

Python 3.x Dask map_块-索引器:元组索引超出范围,python-3.x,dask,dask-delayed,Python 3.x,Dask,Dask Delayed,我想对Dask执行以下操作: 从HDF5文件加载矩阵 并行化每个条目的计算 这是我的密码: def blocked_func(x): return np.random.random() with h5py.File(file_path) as f: d = f['/data'] arr = da.from_array(d, chunks=(chunks_row, chunks_col)) arr2 = arr.map_blocks(blocked_func,

我想对Dask执行以下操作:

  • 从HDF5文件加载矩阵
  • 并行化每个条目的计算
  • 这是我的密码:

    def blocked_func(x):
        return np.random.random()
    
    with h5py.File(file_path) as f:
        d = f['/data']
        arr = da.from_array(d, chunks=(chunks_row, chunks_col))
    
        arr2 = arr.map_blocks(blocked_func, dtype='float32').compute()
    
    但代码会引发以下错误:

    File ".../remote_fr_thinkpad/test_big_data.py", line 43, in <module>
        arr2 = arr.map_blocks(blocked_func, dtype='float32').compute()
      File ".../anaconda3/lib/python3.7/site-packages/dask/base.py", line 156, in compute
        (result,) = compute(self, traverse=False, **kwargs)
      File ".../anaconda3/lib/python3.7/site-packages/dask/base.py", line 399, in compute
        return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
      File ".../anaconda3/lib/python3.7/site-packages/dask/base.py", line 399, in <listcomp>
        return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
      File ".../anaconda3/lib/python3.7/site-packages/dask/array/core.py", line 779, in finalize
        return concatenate3(results)
      File ".../anaconda3/lib/python3.7/site-packages/dask/array/core.py", line 3497, in concatenate3
        chunks = chunks_from_arrays(arrays)
      File ".../anaconda3/lib/python3.7/site-packages/dask/array/core.py", line 3327, in chunks_from_arrays
        result.append(tuple([shape(deepfirst(a))[dim] for a in arrays]))
      File ".../anaconda3/lib/python3.7/site-packages/dask/array/core.py", line 3327, in <listcomp>
        result.append(tuple([shape(deepfirst(a))[dim] for a in arrays]))
    IndexError: tuple index out of range
    
    文件“../remote\u fr\u thinkpad/test\u big\u data.py”,第43行,在
    arr2=arr.map_块(blocked_func,dtype='float32').compute()
    文件“../anaconda3/lib/python3.7/site packages/dask/base.py”,第156行,在compute中
    (结果,)=compute(自我,遍历=False,**kwargs)
    文件“../anaconda3/lib/python3.7/site packages/dask/base.py”,第399行,在compute中
    返回重新打包([f(r,*a)用于r,(f,a)压缩(结果,邮政编码)])
    文件“../anaconda3/lib/python3.7/site packages/dask/base.py”,第399行,在
    返回重新打包([f(r,*a)用于r,(f,a)压缩(结果,邮政编码)])
    文件“../anaconda3/lib/python3.7/site packages/dask/array/core.py”,第779行,最终确定
    返回3(结果)
    文件“../anaconda3/lib/python3.7/site packages/dask/array/core.py”,第3497行,连接3
    chunks=来自数组的chunks\u(数组)
    文件“../anaconda3/lib/python3.7/site packages/dask/array/core.py”,第3327行,从_数组中分块_
    追加(tuple([shape(deepfirst(a))[dim]表示数组中的a]))
    文件“../anaconda3/lib/python3.7/site packages/dask/array/core.py”,第3327行,在
    追加(tuple([shape(deepfirst(a))[dim]表示数组中的a]))
    索引器错误:元组索引超出范围
    
    我在谷歌上搜索了一下,还尝试了达斯克的gu_func,但也出现了同样的错误


    谢谢您的帮助。

    map\u block
    希望
    blocked\u func
    返回与其输入相同形状的数组
    (chunks\u row,chunks\u col)
    ,而实际上它只返回一个浮点

    两种方法都可以

    1) 保持形状的函数,例如:

    def blocked_func(x):
        return x*2
    

    2) 告诉
    map_blocks
    输出的形状将不同:

    arr2 = arr.map_blocks(blocked_func, chunks=(1,1), dtype='float32').compute()
    
    但将输入数组的维数保留在
    blocked_func
    中,例如:

    def blocked_func(x):
        return np.random.random()[None,None]
        # or like this
        # return np.array([1,1])