为什么在运行pandas操作时会收到dask警告？_Dask_Dask Distributed

为什么在运行pandas操作时会收到dask警告？

dask

为什么在运行pandas操作时会收到dask警告？,dask,dask-distributed,Dask,Dask Distributed,我有一个笔记本，上面有熊猫和达斯克的操作当我还没有启动客户端时，一切都如预期的那样。但一旦启动dask.distributed客户端，我会在运行pandas操作的单元格中收到警告，例如，pd.read\u parquet（“我的文件”）当我开始工作时，我得到了保姆的确切数量警告示例： distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.26s. This is often caused by lo

我有一个笔记本，上面有熊猫和达斯克的操作

当我还没有启动客户端时，一切都如预期的那样。但一旦启动dask.distributed客户端，我会在运行pandas操作的单元格中收到警告，例如，

pd.read\u parquet（“我的文件”）

当我开始工作时，我得到了保姆的确切数量

警告示例：

distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.26s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.37s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Scheduler for 1.37s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.36s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.

我想知道原因，以及如何让它们停止。

此警告意味着Dask工作进程在很长一段时间内没有响应。这是不好的，因为工作进程将无法向其他工作进程提供数据、与调度程序通信等。即使在运行计算时，这也是不正常的，因为这些计算是在单独的线程中运行的

这个问题有两个主要原因：

您的任务运行的函数不会释放GIL。这在现在是罕见的（大多数熊猫公司都会发放GIL），但也可能发生。我相信所有的read_拼花地板都会释放GIL

如果这种情况只发生一次，并且只在启动时发生，那么这是一个在

分布式环境下修复的错误。您可能需要升级


您还可以通过增加~/.dask/config.yaml文件中允许的最大滴答时间来消除警告
tick-maximum-delay: 10 s

关于第2点：我确实得到了“dask.distributed”没有属性“\uuu version\uuuu”
。dask的哪个版本有此错误？导入分发；打印（分布式。uuu版本uuu）
我明白了，分布式
也可以作为独立软件包提供。我很困惑，因为还有dask.distributed
。你知道一个好方法来找出哪个函数没有释放GIL吗？如果这是同步代码，抛出一个异常而不是一个日志消息就可以了，但是由于它是异步的，我不知道谁在占用这段时间？