Tensorflow Dask compute()阻塞并且不执行任何操作

Tensorflow Dask compute()阻塞并且不执行任何操作,tensorflow,dask,Tensorflow,Dask,我有一个TensorFlow模型,在训练期间只使用2个核心。我有8个内核,我想在不同的随机样本上训练这些模型来进行比较。我想我可以通过并行训练4个模型来节省时间 import dask.bag as db seeds = db.from_sequence(range(10), npartitions=4) accuracies = seeds.map(lambda seed: train_and_get_accuracy(seed)) print(accuracies.compute()) c

我有一个TensorFlow模型,在训练期间只使用2个核心。我有8个内核,我想在不同的随机样本上训练这些模型来进行比较。我想我可以通过并行训练4个模型来节省时间

import dask.bag as db
seeds = db.from_sequence(range(10), npartitions=4)
accuracies = seeds.map(lambda seed: train_and_get_accuracy(seed))
print(accuracies.compute())
compute()。我搞砸了什么

我查看并尝试打印
精度。dask
。我觉得它看起来不错:

{('from_sequence-41b72669c9abaeca2236693465a55891', 0): [0, 1, 2],
 ('from_sequence-41b72669c9abaeca2236693465a55891', 1): [3, 4, 5],
 ('from_sequence-41b72669c9abaeca2236693465a55891', 2): [6, 7, 8],
 ('from_sequence-41b72669c9abaeca2236693465a55891', 3): [9],
 ('map-lambda-db55048968394cb7b842de6a78e7ee7d', 0): (<function reify at 0x7f9b8355d268>,
                                                      (<class 'map'>,
                                                       <function <lambda> at 0x7f9b78216400>,
                                                       ('from_sequence-41b72669c9abaeca2236693465a55891',
                                                        0))),
 ('map-lambda-db55048968394cb7b842de6a78e7ee7d', 1): (<function reify at 0x7f9b8355d268>,
                                                      (<class 'map'>,
                                                       <function <lambda> at 0x7f9b78216400>,
                                                       ('from_sequence-41b72669c9abaeca2236693465a55891',
                                                        1))),
 ('map-lambda-db55048968394cb7b842de6a78e7ee7d', 2): (<function reify at 0x7f9b8355d268>,
                                                      (<class 'map'>,
                                                       <function <lambda> at 0x7f9b78216400>,
                                                       ('from_sequence-41b72669c9abaeca2236693465a55891',
                                                        2))),
 ('map-lambda-db55048968394cb7b842de6a78e7ee7d', 3): (<function reify at 0x7f9b8355d268>,
                                                      (<class 'map'>,
                                                       <function <lambda> at 0x7f9b78216400>,
                                                       ('from_sequence-41b72669c9abaeca2236693465a55891',
                                                        3)))}
{('from_sequence-41b72669c9abaeca2236693465a55891',0):[0,1,2],
('from_sequence-41b72669c9abaeca2236693465a55891',1):[3,4,5],
('from_sequence-41b72669c9abaeca2236693465a55891',2):[6,7,8],
('from_sequence-41b72669c9abaeca2236693465a55891',3):[9],
('map-lambda-db55048968394cb7b842de6a78e7ee7d',0):(,,
(,
,
(‘来自_序列-41B72669C9ABEA2236693465A55891’,
0))),
('map-lambda-db55048968394cb7b842de6a78e7ee7d',1):(,,
(,
,
(‘来自_序列-41B72669C9ABEA2236693465A55891’,
1))),
('map-lambda-db55048968394cb7b842de6a78e7ee7d',2):(,,
(,
,
(‘来自_序列-41B72669C9ABEA2236693465A55891’,
2))),
('map-lambda-db55048968394cb7b842de6a78e7ee7d',3):(,,
(,
,
(‘来自_序列-41B72669C9ABEA2236693465A55891’,
3)))}
我还可以检查什么来了解发生了什么?这是一个Ubuntu 16.04系统。

尝试使用线程 默认情况下,dask.bag在计算机上使用单独的进程进行并行化。这对于纯Python代码是理想的(因为GIL),但对于Tensorflow这样的数字代码可能并不理想,特别是如果Tensorflow库不能很好地处理分叉进程(可能是这样的情况?)

您可以通过设置以下选项全局执行此操作

import dask
dask.set_options(get=dask.threaded.get)
或者在计算调用中设置
get=

accuracies.compute(get=dask.threaded.get)
考虑使用Dask.延迟
bag提供了一个非常简单的接口,有点类似于Spark RDD。对于将来更复杂的算法,您也可以尝试。

我想TensorFlow本身就有很多魔力。可能无法使用Dask运行它吗?没有TensorFlow它也可以正常工作……它给了我约30%的加速,而不是我希望的4倍。我猜TensorFlow有很多锁。但至少它起作用了!谢谢!Tensorflow可能已经在使用你所有的核心了?您也可以尝试分布式调度程序。看见