Python 从本地Dask过渡到集群…;

Python 从本地Dask过渡到集群…;,python,dask,dask-kubernetes,Python,Dask,Dask Kubernetes,我有一个简单的、令人尴尬的并行程序,我正在Dask本地成功运行。耶!现在我想把它移动到一个集群,并扩大问题的规模。在这种情况下,我使用的是GCP。我尝试了两种方法,GCPCluster()和HelmCluster(),每种方法都提供了不同的故障路径。(我之前已经成功地实例化了GCE计算。因此,我们可以假设我已经解决了所有安全/登录凭据。联网可能是另一回事。)以下是主要例程: from dask import delayed from dask.distributed import Client,

我有一个简单的、令人尴尬的并行程序,我正在Dask本地成功运行。耶!现在我想把它移动到一个集群,并扩大问题的规模。在这种情况下,我使用的是GCP。我尝试了两种方法,
GCPCluster()
HelmCluster()
,每种方法都提供了不同的故障路径。(我之前已经成功地实例化了GCE计算。因此,我们可以假设我已经解决了所有安全/登录凭据。联网可能是另一回事。)以下是主要例程:

from dask import delayed
from dask.distributed import Client, wait, as_completed, LocalCluster
from dask_kubernetes import HelmCluster
from dask_cloudprovider.gcp import GCPCluster
from problem.loop import inner_loop
from problem.problemSpec import problemInit

# gRange = 99
gRange = 12


def phase_transition(client: Client):
    p = problemInit()
    m = p.m
    loop = delayed(inner_loop)

    loops = [loop(int(m[i])) for i in range(gRange)]
    # visualize(loops, filename='delayed_results', format='svg')
    futures = client.compute(loops)
    wait(futures)
    for future, result in as_completed(futures, with_results=True):
        print(result)


if __name__ == "__main__":
    # with LocalCluster(dashboard_address='localhost:8787') as cluster:
    with GCPCluster(projectid='random-words-654321', machine_type='n1-standard-4', n_workers=2) as cluster:
        with Client(cluster) as client:
            phase_transition(client)

使用
GCPCluster()
时,系统将等待调度程序的响应。以下是日志消息:

Launching cluster with the following configuration: 
  Source Image: projects/ubuntu-os-cloud/global/images/ubuntu-minimal-1804-bionic-v20201014 
  Docker Image: daskdev/dask:latest 
  Machine Type: n1-standard-4 
  Filesytsem Size: 50 
  Disk Type: pd-standard 
  N-GPU Type:  
  Zone: us-east1-c 
Creating scheduler instance
dask-837e1ad1-scheduler
    Internal IP: 10.142.0.4
    External IP: 35.237.42.13
Waiting for scheduler to run at 35.237.42.13:8786
scheduler
系统启动了,我可以
SSH
进入它。看起来像是网络问题。(顺便说一句,我使用类似于
daskdev/dask:latest
)调用的Conda映像从PyCharm运行此程序。显然,我们甚至没有开始在云上安装本地代码

这是一些Dask和GCP的经验可以解决的问题,我还没有这样的经验。因此,请允许我尝试一种不同的路径来浏览文档,并启动由Helm管理的k8s集群。对我的代码的唯一更改是:

if __name__ == "__main__":
    cluster = HelmCluster(release_name='gke-dask')
    with Client(cluster) as client:
        phase_transition(client)
这比以前好多了。它现在在我的本地计算机上的子目录中查找代码时遇到问题,
problem
。以下是日志:

Forwarding from 127.0.0.1:65410 -> 8786
Forwarding from [::1]:65410 -> 8786
Handling connection for 65410
Handling connection for 65410
/Users/awd/opt/anaconda3/envs/dask-cvxpy/lib/python3.8/site-packages/distributed/client.py:1140: VersionMismatchWarning: Mismatched versions found
+---------+---------------+---------------+---------------+
| Package | client        | scheduler     | workers       |
+---------+---------------+---------------+---------------+
| blosc   | None          | 1.9.2         | 1.9.2         |
| lz4     | 3.1.3         | 3.1.1         | 3.1.1         |
| msgpack | 1.0.2         | 1.0.0         | 1.0.0         |
| numpy   | 1.20.2        | 1.18.1        | 1.18.1        |
| python  | 3.8.8.final.0 | 3.8.0.final.0 | 3.8.0.final.0 |
+---------+---------------+---------------+---------------+
Notes: 
-  msgpack: Variation is ok, as long as everything is above 0.6
  warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))
Handling connection for 65410
Handling connection for 65410
Handling connection for 65410
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/awd/Projects/Stats285/ExamplePhaseTransition/main_func.py", line 39, in <module>
    phase_transition(client)
  File "/Users/awd/Projects/Stats285/ExamplePhaseTransition/main_func.py", line 28, in phase_transition
    for future, result in as_completed(futures, with_results=True):
  File "/Users/awd/opt/anaconda3/envs/dask-cvxpy/lib/python3.8/site-packages/distributed/client.py", line 4336, in __next__
    return self._get_and_raise()
  File "/Users/awd/opt/anaconda3/envs/dask-cvxpy/lib/python3.8/site-packages/distributed/client.py", line 4327, in _get_and_raise
    raise exc.with_traceback(tb)
  File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
ModuleNotFoundError: No module named 'problem'
从127.0.0.1:65410转发->8786
转发自[::1]:65410->8786
处理65410的连接
处理65410的连接
/Users/awd/opt/anaconda3/envs/dask-cvxpy/lib/python3.8/site-packages/distributed/client.py:1140:VersionMismatchWarning:发现不匹配的版本
+---------+---------------+---------------+---------------+
|包|客户端|调度程序|工作人员|
+---------+---------------+---------------+---------------+
|blosc |无| 1.9.2 | 1.9.2|
|lz4 | 3.1.3 | 3.1.1 | 3.1.1|
|msgpack | 1.0.2 | 1.0.0 | 1.0.0|
|努比| 1.20.2 | 1.18.1 | 1.18.1|
|python | 3.8.8.final.0 | 3.8.0.final.0 | 3.8.0.final.0|
+---------+---------------+---------------+---------------+
笔记:
-msgpack:只要一切都在0.6以上,变化就可以了
warnings.warn(version_module.VersionMismatchWarning(msg[0][“warning”]))
处理65410的连接
处理65410的连接
处理65410的连接
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py”,第197行,在runfile中
pydev_imports.execfile(文件名、全局变量、本地变量)#执行脚本
文件“/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py”,execfile中第18行
exec(编译(内容+“\n”,文件,'exec'),全局,loc)
文件“/Users/awd/Projects/Stats285/ExamplePhaseTransition/main_func.py”,第39行,在
阶段转换(客户)
文件“/Users/awd/Projects/Stats285/ExamplePhaseTransition/main_func.py”,第28行,处于阶段转换中
对于未来,结果为“已完成”(未来,结果为“真”):
文件“/Users/awd/opt/anaconda3/envs/dask-cvxpy/lib/python3.8/site-packages/distributed/client.py”,第4336行,下一页__
返回自我。_get_和_raise()
文件“/Users/awd/opt/anaconda3/envs/dask-cvxpy/lib/python3.8/site-packages/distributed/client.py”,第4327行,在get和raise中
使用回溯(tb)提升exc
文件“/opt/conda/lib/python3.8/site packages/distributed/protocol/pickle.py”,第75行,加载
ModuleNotFoundError:没有名为“问题”的模块
在实践中,我正在寻找任何一个问题的帮助。我稍微偏爱
GCPCluster()
解决方案