Python 3.x Dask客户端可以';无法连接到dask计划程序

Python 3.x Dask客户端可以';无法连接到dask计划程序,python-3.x,ssl-certificate,dask-distributed,Python 3.x,Ssl Certificate,Dask Distributed,我使用的是dask 1.1.1(最新版本),我已使用以下命令在命令行启动了dask调度程序: $ dask-scheduler --port 9796 --bokeh-port 9797 --bokeh-prefix my_project distributed.scheduler - INFO - ----------------------------------------------- distributed.scheduler - INFO - Clear task state dis

我使用的是dask 1.1.1(最新版本),我已使用以下命令在命令行启动了dask调度程序:

$ dask-scheduler --port 9796 --bokeh-port 9797 --bokeh-prefix my_project
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:     tcp://10.1.0.107:9796
distributed.scheduler - INFO -       bokeh at:                     :9797
distributed.scheduler - INFO - Local Directory:    /tmp/scheduler-pdnwslep
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Register tcp://10.1.25.4:36310
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.1.25.4:36310
distributed.core - INFO - Starting established connection
然后。。。我尝试使用以下代码启动客户端以连接到计划程序:

from dask.distributed import Client
c = Client('10.1.0.107:9796', set_as_default=False)
但在尝试这样做时,我得到了一个错误:

...
 File "/root/anaconda3/lib/python3.7/site-packages/tornado/concurrent.py", line 238, in result
  raise_exc_info(self._exc_info)
 File "<string>", line 4, in raise_exc_info
 tornado.gen.TimeoutError: Timeout
During handling of the above exception, another exception occurred:
...
 File "/root/anaconda3/lib/python3.7/site-packages/distributed/comm/core.py", line 195, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://10.1.0.107:9796' after 10 s: connect() didn't finish in time
。。。
文件“/root/anaconda3/lib/python3.7/site packages/tornado/concurrent.py”,结果中第238行
提升exc信息(自身exc信息)
文件“”,第4行,在raise_exc_信息中
tornado.gen.TimeoutError:超时
在处理上述异常期间,发生了另一个异常:
...
文件“/root/anaconda3/lib/python3.7/site packages/distributed/comm/core.py”,第195行,在
引发IOError(msg)
操作错误:尝试连接到“”时超时tcp://10.1.0.107:9796'10秒后:connect()未及时完成
这已经在一个已经运行了几个月的系统中进行了硬编码。所以我写这个问题只是为了验证我在编程上没有做错什么,对吗?我想一定是环境出了问题。你觉得一切都好吗?除了dask和python之外,什么样的事情可以阻止这一切?证书?不同版本的软件包?想法(见相关评论)

dask包装器主要用于在我们的特定配置中烘焙,并使其易于在我们的系统中与docker容器一起使用:

''' daskwrapper: easy access to distributed computing '''
import webbrowser
from dask.distributed import Client as DaskClient
from . import config

scheduler_config = { # from yaml
    "scheduler_hostname": "schedulermachine.corpdomain.com"
    "scheduler_ip": "10.0.0.1"}
worker_config = { # from yaml
    "environments": {
        "generic": {
            "scheduler_port": 9796,
            "dashboard_port": 9797,
            "worker_port": 67176}}}

class Client():

    def __init__(self, environment: str):
        (
            self.scheduler_hostname,
            self.scheduler_port,
            self.dashboard_port,
            self.scheduler_address) = self.get_scheduler_details(environment)
        self.client = DaskClient(self.scheduler_address, asynchronous=False)

    def get_scheduler_details(self, environment: str) -> tuple:
        ''' gets it from a map of availble docker images... '''
        envs = worker_config['environments']
        return (
            scheduler_config['scheduler_hostname'],
            envs[environment]['scheduler_port'],
            envs[environment]['dashboard_port'],
            (
                f"{scheduler_config['scheduler_hostname']}:"
                f"{str(envs[environment]['scheduler_port'])}"))

    def open_status(self):
        webbrowser.open_new_tab(self.get_status())

    def get_status(self):
        return f'http://{self.scheduler_hostname}:{self.dashboard_port}/status'

    def get_async_client(self):
        ''' returns a client instance so the user can use it directly '''
        return DaskClient(self.scheduler_address, asynchronous=True)

    def get(self, workflow: dict, tasks: 'str|list'):
        return self.client.get(workflow, tasks)

    async def submit(self, function: callable, args: list):
        ''' saved as example dask api '''
        if not isinstance(args, list) and not isinstance(args, tuple):
            args = [args]
        async with DaskClient(self.scheduler_address, asynchronous=True) as client:
            future = client.submit(function, *args)
            result = await future
        return result

    def close(self):
        return self.client.close()
这就是客户机,其使用方式如下:

from daskwrapper import Client
dag = {'some_task': (some_task_function, )}
workers = Client(environment='some_environment')
workers.get(workflow=dag, tasks='some_task')
workers.close()
计划程序的启动方式如下:

def start():
    def start_scheduler(port, dashboard_port):
        async def f():
            s = Scheduler(
                port=port,
                dashboard_address=f"0.0.0.0:{dashboard_port}")
            s = await s
            await s.finished()

        asyncio.get_event_loop().run_until_complete(f())

    worker_config = configs.get(repo='spartan_worker')
    envs = worker_config['environments']
    for key, value in envs.items():
        port = value['scheduler_port']
        dashboard_port = str(value['dashboard_port'])
        thread = Thread(
            target=start_scheduler,
            args=(port, dashboard_port))
        thread.start()
工人们:

def start(
    scheduler_address: str,
    scheduler_port: int,
    worker_address: str,
    worker_port: int
):
    async def f(scheduler_address):
        w = await Worker(
            scheduler_address,
            port=worker_port,
            contact_address=f'{worker_address}:{worker_port}')
        await w.finished()

    asyncio.get_event_loop().run_until_complete(f(
        f'tcp://{scheduler_address}:{str(scheduler_port)}'))

这可能不会直接帮助您解决这个问题,但我确实相信,由于我们已将其对接,因此我们不再存在该问题。这里缺少很多东西,但这是基础,可能有更好的方法让分布式计算上的专用环境易于使用,但这符合我们的需要。

您解决过这个问题吗?面对同样的问题,这个错误没有发生任何重大变化。@ZevAverbach噢,伙计,那是很久以前的事了,我肯定我找到了解决办法,但我不记得可能是什么。这可能是我们刚刚重写整个系统的时候了:我们在一个点上停靠了worker和scheduler。也许这是一个SSL问题——如果环境没有在公司网络中正确设置,我们会遇到很多这样的问题。你知道吗,我会补充一个答案,这是我们为达斯克做的一个小包装:可能不会有多大帮助,但可能。