Python 如何与jupyter和sklearn并行?
我正在尝试并行化Python 如何与jupyter和sklearn并行?,python,scikit-learn,jupyter-notebook,jupyter,ipython-parallel,Python,Scikit Learn,Jupyter Notebook,Jupyter,Ipython Parallel,我正在尝试并行化scikit-learn的GridSearchCV。它在一个jupyter(hub)笔记本上运行。经过一些研究,我发现以下代码: from sklearn.externals.joblib import Parallel, parallel_backend, register_parallel_backend from ipyparallel import Client from ipyparallel.joblib import IPythonParallelBackend
scikit-learn
的GridSearchCV
。它在一个jupyter(hub)笔记本上运行。经过一些研究,我发现以下代码:
from sklearn.externals.joblib import Parallel, parallel_backend, register_parallel_backend
from ipyparallel import Client
from ipyparallel.joblib import IPythonParallelBackend
c = Client(profile='myprofile')
print(c.ids)
bview = c.load_balanced_view()
register_parallel_backend('ipyparallel', lambda : IPythonParallelBackend(view=bview))
grid = GridSearchCV(pipeline, cv=3, n_jobs=4, param_grid=param_grid)
with parallel_backend('ipyparallel'):
grid.fit(X_train, Y_train)
请注意,我已将n_jobs
参数设置为4
,这是机器的cpu内核数。(这是nproc
返回的内容)
但它似乎不起作用:ImportError:无法导入名称“register\U parallel\U backend”
,尽管我使用conda install joblib安装了joblib
,还尝试了pip install-U joblib
那么,在这种环境下,并行化GridSearchCV
的最佳方法是什么
更新:
无需ipyparallel
,只需设置n_作业
参数:
grid = GridSearchCV(pipeline, cv=3, n_jobs=4, param_grid=param_grid)
grid.fit(X_train, Y_train)
结果显示以下警告消息:
/opt/conda/lib/python3.5/site- packages/sklearn/externals/joblib/parallel.py:540: UserWarning:
Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
似乎它以顺序执行而不是并行执行结束。我认为n_jobs=-1
会将所有cpu核心启动到parallel@AlexanderYau:仅设置参数就会抛出错误消息,我已更新帖子。您的机器中有多少cpu内核?@AlexanderYau正好有4个cpu内核。