Scikit learn 带有sklearn随机林的Dask ML导致连接关闭

Scikit learn 带有sklearn随机林的Dask ML导致连接关闭,scikit-learn,dask,dask-distributed,dask-dataframe,dask-ml,Scikit Learn,Dask,Dask Distributed,Dask Dataframe,Dask Ml,我正在尝试使用Dask-ML训练模型。我的最终目标是在大于内存的数据集上进行预测,因此我正在使用Dask的ParallelPostFit包装器在相对较小的数据集(4 Gb)上训练模型,期望稍后在较大的数据帧上进行预测。我正在连接一个有50名工人的纱线集群,将我的数据从拼花地板加载到dask数据框中,创建一个管道,并进行培训。培训是有效的,但当我尝试在搁置的测试集上进行评估时,我遇到了问题。当我使用sklearn的LogisticRegression作为分类器时,训练和预测成功运行。然而,当我使用

我正在尝试使用Dask-ML训练模型。我的最终目标是在大于内存的数据集上进行预测,因此我正在使用Dask的ParallelPostFit包装器在相对较小的数据集(4 Gb)上训练模型,期望稍后在较大的数据帧上进行预测。我正在连接一个有50名工人的纱线集群,将我的数据从拼花地板加载到dask数据框中,创建一个管道,并进行培训。培训是有效的,但当我尝试在搁置的测试集上进行评估时,我遇到了问题。当我使用sklearn的LogisticRegression作为分类器时,训练和预测成功运行。然而,当我使用一个带有100个估计器的sklearn随机林时,训练步骤成功运行,但是在预测之后,我得到了下面的误差。我注意到在预测计算步骤中,在断开连接错误之前,我的本地机器内存使用量开始激增。当我将RF估计器的数量减少到10时,预测步骤成功运行。有人能帮我理解发生了什么事吗

我的代码(浓缩)

输出:

computing ypred
distributed.batched - INFO - Batched Comm Closed: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
---------------------------------------------------------------------------
CancelledError                            Traceback (most recent call last)
<ipython-input-108-23f303f7584c> in <module>
      5 
      6 print('computing ypred')
----> 7 y_preds = [pipe.predict(X_test).compute() for pipe in pipes]
      8 
      9 print('computing yprob')

<ipython-input-108-23f303f7584c> in <listcomp>(.0)
      5 
      6 print('computing ypred')
----> 7 y_preds = [pipe.predict(X_test).compute() for pipe in pipes]
      8 
      9 print('computing yprob')

~/.conda/envs/boa/lib/python3.7/site-packages/dask/base.py in compute(self, **kwargs)
    164         dask.base.compute
    165         """
--> 166         (result,) = compute(self, traverse=False, **kwargs)
    167         return result
    168 

~/.conda/envs/boa/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
    435     keys = [x.__dask_keys__() for x in collections]
    436     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 437     results = schedule(dsk, keys, **kwargs)
    438     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    439 

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   2593                     should_rejoin = False
   2594             try:
-> 2595                 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   2596             finally:
   2597                 for f in futures.values():

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
   1891                 direct=direct,
   1892                 local_worker=local_worker,
-> 1893                 asynchronous=asynchronous,
   1894             )
   1895 

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    778         else:
    779             return sync(
--> 780                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    781             )
    782 

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    346     if error[0]:
    347         typ, exc, tb = error[0]
--> 348         raise exc.with_traceback(tb)
    349     else:
    350         return result[0]

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/utils.py in f()
    330             if callback_timeout is not None:
    331                 future = asyncio.wait_for(future, callback_timeout)
--> 332             result[0] = yield future
    333         except Exception as exc:
    334             error[0] = sys.exc_info()

~/.conda/envs/boa/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

CancelledError: 
distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
concurrent.futures._base.CancelledError
计算ypred
distributed.batched-INFO-批处理通信已关闭:in:ConnectionResetError:[Errno 104]对等方重置连接
---------------------------------------------------------------------------
取消错误回溯(最近一次呼叫上次)
在里面
5.
6打印('ypred')
---->7 y_preds=[pipe.predict(X_test).compute()用于管道中的管道]
8.
9打印('计算yprob')
英寸(.0)
5.
6打印('ypred')
---->7 y_preds=[pipe.predict(X_test).compute()用于管道中的管道]
8.
9打印('计算yprob')
计算中的~/.conda/envs/boa/lib/python3.7/site-packages/dask/base.py(self,**kwargs)
164 dask.base.compute
165         """
-->166(结果,)=compute(自我,遍历=False,**kwargs)
167返回结果
168
计算中的~/.conda/envs/boa/lib/python3.7/site-packages/dask/base.py(*args,**kwargs)
435个键=[x.\uu dask\u keys\uuu()表示集合中的x]
436 postcomputes=[x.\uuu dask\u postcompute\uuuu()表示集合中的x]
-->437结果=时间表(dsk、键、**kwargs)
438返回重新打包([f(r,*a)用于r,(f,a)压缩(结果,邮政编码)])
439
get中的~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py(self、dsk、key、restrictions、loose\u restrictions、resources、sync、asynchronous、direct、retries、priority、fifo\u timeout、actors、**kwargs)
2593应该重新加入=错误
2594尝试:
->2595结果=自聚集(打包、异步=异步、直接=直接)
2596最后:
2597对于期货中的f.values():
聚集中的~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py(self、futures、errors、direct、asynchronous)
1891直接=直接,
1892本地工人=本地工人,
->1893异步=异步,
1894             )
1895
~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py处于同步状态(self、func、异步、回调超时、*args、**kwargs)
778其他:
779返回同步(
-->780 self.loop,func,*args,callback\u timeout=callback\u timeout,**kwargs
781             )
782
~/.conda/envs/boa/lib/python3.7/site-packages/distributed/utils.py处于同步状态(循环、函数、回调超时、*args、**kwargs)
346如果错误[0]:
347典型,exc,tb=错误[0]
-->348带回溯的提升exc(tb)
349其他:
350返回结果[0]
f()中的~/.conda/envs/boa/lib/python3.7/site-packages/distributed/utils.py
330如果回调超时不是无:
331 future=asyncio.wait\u for(future,回调\u超时)
-->332结果[0]=未来收益率
333除作为exc的例外情况外:
334错误[0]=sys.exc_info()
~/.conda/envs/boa/lib/python3.7/site-packages/tornado/gen.py正在运行(self)
733
734尝试:
-->735 value=future.result()
736例外情况除外:
737 exc_info=sys.exc_info()
取消错误:
distributed.client-错误-10.00秒后无法重新连接到计划程序,正在关闭客户端
_从未检索到GatheringFuture异常
未来:
并发.futures.\u base.cancelled错误

谢谢你的提问。我建议创建一个@MRocklin谢谢。我将在一个更有限的环境中复制它。同时,是否可以从使用更少(10对100)时所有东西都能成功运行这一事实中学到什么射频估计器?谢谢你的问题。我建议创建一个@MRocklin谢谢。我将在一个更有限的环境中复制它。同时,当使用更少的射频估计器(10对100)时,是否可以从所有东西都能成功运行这一事实中学到什么?
computing ypred
distributed.batched - INFO - Batched Comm Closed: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
---------------------------------------------------------------------------
CancelledError                            Traceback (most recent call last)
<ipython-input-108-23f303f7584c> in <module>
      5 
      6 print('computing ypred')
----> 7 y_preds = [pipe.predict(X_test).compute() for pipe in pipes]
      8 
      9 print('computing yprob')

<ipython-input-108-23f303f7584c> in <listcomp>(.0)
      5 
      6 print('computing ypred')
----> 7 y_preds = [pipe.predict(X_test).compute() for pipe in pipes]
      8 
      9 print('computing yprob')

~/.conda/envs/boa/lib/python3.7/site-packages/dask/base.py in compute(self, **kwargs)
    164         dask.base.compute
    165         """
--> 166         (result,) = compute(self, traverse=False, **kwargs)
    167         return result
    168 

~/.conda/envs/boa/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
    435     keys = [x.__dask_keys__() for x in collections]
    436     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 437     results = schedule(dsk, keys, **kwargs)
    438     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    439 

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   2593                     should_rejoin = False
   2594             try:
-> 2595                 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   2596             finally:
   2597                 for f in futures.values():

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
   1891                 direct=direct,
   1892                 local_worker=local_worker,
-> 1893                 asynchronous=asynchronous,
   1894             )
   1895 

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    778         else:
    779             return sync(
--> 780                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    781             )
    782 

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    346     if error[0]:
    347         typ, exc, tb = error[0]
--> 348         raise exc.with_traceback(tb)
    349     else:
    350         return result[0]

~/.conda/envs/boa/lib/python3.7/site-packages/distributed/utils.py in f()
    330             if callback_timeout is not None:
    331                 future = asyncio.wait_for(future, callback_timeout)
--> 332             result[0] = yield future
    333         except Exception as exc:
    334             error[0] = sys.exc_info()

~/.conda/envs/boa/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

CancelledError: 
distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
concurrent.futures._base.CancelledError