Dask 用` map_分区并行预测`

Dask 用` map_分区并行预测`,dask,dask-dataframe,Dask,Dask Dataframe,我有一个形状(25M,79)的数据框,我正试图在它上面并行一个sklearn管道预测 当我只为一个分区运行它时,它会按预期工作: n_partitions = 1000 ddf = dd.from_pandas(df_x_selection, npartitions=n_partitions) grid_searcher.best_estimator_.predict_proba(ddf.get_partition(0)) 但如果我将其应用于每个分区,那么它将失败: n_partitions

我有一个形状(25M,79)的数据框,我正试图在它上面并行一个sklearn管道预测

当我只为一个分区运行它时,它会按预期工作:

n_partitions = 1000
ddf = dd.from_pandas(df_x_selection, npartitions=n_partitions)
grid_searcher.best_estimator_.predict_proba(ddf.get_partition(0))
但如果我将其应用于每个分区,那么它将失败:

n_partitions = 1000
ddf = dd.from_pandas(df_x_selection, npartitions=n_partitions)

def _f(_df, _pipeline, _predicted_class) -> np.array:
    return _pipeline.predict_proba(_df)[:, _predicted_class]

ddf.map_partitions(_f, grid_searcher.best_estimator_, 1, meta=(None, 'f8')).compute()
错误是:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
    130             raise ValueError(
--> 131                 f"Wrong number of items passed {len(self.values)}, "
    132                 f"placement implies {len(self.mgr_locs)}"

ValueError: Wrong number of items passed 79, placement implies 100
我做错了什么?
感谢

管道的分类器步骤是一个Lightgbm模型(我知道-现在-它只支持一个执行预测函数的过程),但这是如何产生上述错误的?