dask并行化可以封装在一个类中吗?

dask并行化可以封装在一个类中吗?,dask,dask-distributed,Dask,Dask Distributed,是否可以将dask并行化封装在一个类中?在它的最终形式中,我的类在调用run之前将进行大量初始化-我将我的问题精简为框架问题。请注意,该代码适用于LocalCluster,并且类外部的分布式Calc也适用于同一HPC集群。下面是简要的代码以及相应的错误消息: import numpy as np from dask_jobqueue import PBSCluster from dask.distributed import Client from dask.distributed import

是否可以将dask并行化封装在一个类中?在它的最终形式中,我的类在调用run之前将进行大量初始化-我将我的问题精简为框架问题。请注意,该代码适用于LocalCluster,并且类外部的分布式Calc也适用于同一HPC集群。下面是简要的代码以及相应的错误消息:

import numpy as np
from dask_jobqueue import PBSCluster
from dask.distributed import Client
from dask.distributed import wait

class Simulate:
    def __init__(self):
        pass

    def run(self):
        cluster = PBSCluster(cores=12, memory='1GB', queue='low', project='classtest', name='classtest_dask',
                             walltime='02:00:00', local_directory='/scratch/mmatthews')
        cluster.scale(10)  # Ask for # workers
        client = Client(cluster)

        seeds = list(np.arange(100))
        a = client.map(self.run_trial, seeds)
        wait(a)

        trial_results = [a[i].result() for i in range(len(a))]

        cluster.scale(0)
        cluster.close()

    def run_trial(self, trial_seed):
        np.random.seed(trial_seed)
        rst = np.random.randint
        print('Simulation Finished rst=%s' % rst)
        return rst

simob = Simulate()
simob.run()
发送到StdErr的错误:

> distributed.client - ERROR - Failed to reconnect to scheduler after
> 10.00 seconds, closing client distributed.utils - ERROR -  Traceback (most recent call last):   File
> "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/distributed/utils.py",
> line 666, in log_errors
>     yield   File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/distributed/client.py",
> line 1268, in _close
>     await gen.with_timeout(timedelta(seconds=2), list(coroutines)) concurrent.futures._base.CancelledError distributed.utils - ERROR - 
> Traceback (most recent call last):   File
> "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/distributed/utils.py",
> line 666, in log_errors
>     yield   File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/distributed/client.py",
> line 998, in _reconnect
>     await self._close()   File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/distributed/client.py",
> line 1268, in _close
>     await gen.with_timeout(timedelta(seconds=2), list(coroutines)) concurrent.futures._base.CancelledError
PBS错误文件中的错误:

$ cat classtest_dask.e156272
distributed.nanny - INFO -         Start Nanny at: 'tcp://160.84.192.224:40753'
distributed.diskutils - INFO - Found stale lock file and directory '/scratch/mmatthews/worker-bnjpcqmq', purging
distributed.worker - INFO -       Start worker at: tcp://160.84.192.224:44564
distributed.worker - INFO -          Listening to: tcp://160.84.192.224:44564
distributed.worker - INFO -          dashboard at:       160.84.192.224:35232
distributed.worker - INFO - Waiting to connect to: tcp://160.84.192.193:39664
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                         12
distributed.worker - INFO -                Memory:                 1000.00 MB
distributed.worker - INFO -       Local Directory: /scratch/mmatthews/worker-kbw6dtj_
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to: tcp://160.84.192.193:39664
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
distributed.dask_worker - INFO - Exiting on signal 15
distributed.nanny - INFO - Closing Nanny at 'tcp://160.84.192.224:40753'
distributed.dask_worker - INFO - End worker
Traceback (most recent call last):
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/distributed/cli/dask_worker.py", line 410, in <module>
    go()
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/distributed/cli/dask_worker.py", line 406, in go
    main()
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site-packages/distributed/cli/dask_worker.py", line 397, in main
    raise TimeoutError("Timed out starting worker.") from None
tornado.util.TimeoutError: Timed out starting worker.
$cat classtest\u dask.e156272
distributed.nanny-信息-启动保姆地点:'tcp://160.84.192.224:40753'
distributed.diskutils-INFO-找到过时的锁文件和目录'/scratch/mmatthews/worker bnjpcqmq',正在清除
distributed.worker-信息-在以下位置启动工作程序:tcp://160.84.192.224:44564
distributed.worker-信息-侦听:tcp://160.84.192.224:44564
distributed.worker-信息-仪表板地址:160.84.192.224:35232
distributed.worker-信息-正在等待连接到:tcp://160.84.192.193:39664
distributed.worker-信息--------------------------------------------------
distributed.worker-信息-线程:12
distributed.worker-信息-内存:1000.00 MB
distributed.worker-INFO-本地目录:/scratch/mmatthews/worker-kbw6dtj_
distributed.worker-信息--------------------------------------------------
distributed.worker-信息-注册到:tcp://160.84.192.193:39664
distributed.worker-信息--------------------------------------------------
distributed.core-信息-启动已建立的连接
distributed.dask_worker-信息-在信号15上退出
distributed.nanny-信息-正在关闭“保姆”tcp://160.84.192.224:40753'
distributed.dask_worker-信息-终端worker
回溯(最近一次呼叫最后一次):
文件“/nfs/system/miniconda3_-dev/envs/rosetta_-dev/lib/python3.7/runpy.py”,第193行,位于主运行模块中
“\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
文件“/nfs/system/miniconda3_-dev/envs/rosetta_-dev/lib/python3.7/runpy.py”,运行代码第85行
exec(代码、运行\全局)
文件“/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site packages/distributed/cli/dask_worker.py”,第410行,在
go()
文件“/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site packages/distributed/cli/dask_worker.py”,第406行,在go中
main()
文件“/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site packages/click/core.py”,第764行,在调用中__
返回self.main(*args,**kwargs)
文件“/nfs/system/miniconda3_-dev/envs/rosetta_-dev/lib/python3.7/site-packages/click/core.py”,第717行,在main中
rv=自调用(ctx)
文件“/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site packages/click/core.py”,第956行,在invoke中
返回ctx.invoke(self.callback,**ctx.params)
文件“/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site packages/click/core.py”,第555行,在invoke中
返回回调(*args,**kwargs)
文件“/nfs/system/miniconda3_dev/envs/rosetta_dev/lib/python3.7/site packages/distributed/cli/dask_worker.py”,第397行,主文件
从“无”引发TimeoutError(“启动工作程序超时”)
tornado.util.TimeoutError:启动工作进程时超时。
dask并行化可以封装在一个类中吗

对。Dask调用只是普通的Python调用。没有什么能阻止他们与语言的其他部分进行交流

你的实际错误似乎完全无关。好像有什么东西杀了你的工人

distributed.dask_worker-信息-在信号15上退出


不幸的是,没有关于那是什么的信息。我建议您与系统管理员联系。

在对象上下文中关闭辅助对象的最佳方法是什么?我无法将集群作为成员变量以供以后清理,因为它会导致“TypeError:无法pickle coroutine objects”。