在dask包上按顺序迭代_Dask_Concurrent.futures_Dask Distributed

在dask包上按顺序迭代

dask

在dask包上按顺序迭代,dask,concurrent.futures,dask-distributed,Dask,Concurrent.futures,Dask Distributed,我需要将一个非常大的dask.bag的元素提交到一个非线程安全的商店，即我需要类似的东西 for x in dbag: store.add(x) 我不能使用compute，因为包太大，无法放入内存。我需要更像分布式的东西。as_completed，但这对分布式的行李有效。as_completed不起作用。我可能会继续使用普通计算，但会添加一个锁 def commit(x, lock=None): with lock: store.add(x) b.map(c

我需要将一个非常大的

dask.bag

的元素提交到一个非线程安全的商店，即我需要类似的东西

for x in dbag:
    store.add(x)

我不能使用

compute

，因为包太大，无法放入内存。

我需要更像

分布式的东西。as_completed

，但这对

分布式的行李有效。as_completed

不起作用。

我可能会继续使用普通计算，但会添加一个锁

def commit(x, lock=None):
    with lock:
        store.add(x)

b.map(commit, lock=my_lock)

您可以在其中创建

线程.Lock

，或

多处理.Lock

，具体取决于您正在执行的处理类型

如果你想使用as_completed，你可以将你的包转换为futures，并在其上使用as_completed

from distributed.client import futures_of, as_completed
b = b.persist()
futures = futures_of(b)

for future in as_completed(futures):
    for x in future.result():
        store.add(x)

您还可以转换为数据帧，我相信它的迭代更加合理

df = b.to_dataframe(...)
for x in df.iteritems(...):
    ...