Python &引用;副本主机0的磁盘已用完。”;在Sklearn期间,Google Cloud ML引擎适合使用

Python &引用;副本主机0的磁盘已用完。”;在Sklearn期间,Google Cloud ML引擎适合使用,python,scikit-learn,google-cloud-ml,Python,Scikit Learn,Google Cloud Ml,我正在尝试使用Google Cloud ML Engine对500 MB数据(10000行x 26000列)进行sklearn LDA gridsearch,以找到适合我的主题建模项目的主题数量 每个CV折叠的最大迭代次数设置为100。 经过47次迭代后,作业将失败,原因如下。我使用BASIC层、STANDARD层和CUSTOM层以及masterType=complex\u model\m尝试了这个方法,每次都会发生相同的错误 我在stackoverflow上找不到更多关于这个问题的内容,尽管我

我正在尝试使用Google Cloud ML Engine对500 MB数据(10000行x 26000列)进行
sklearn LDA gridsearch
,以找到适合我的主题建模项目的主题数量

每个CV折叠的最大迭代次数设置为100。 经过47次迭代后,作业将失败,原因如下。我使用
BASIC
层、
STANDARD
层和
CUSTOM
层以及
masterType=complex\u model\m
尝试了这个方法,每次都会发生相同的错误

我在stackoverflow上找不到更多关于这个问题的内容,尽管我确实遇到过,这似乎在某种程度上是相关的。原始询问者提供了一个解决方案:

Solved : This error was coming not because of Storage Space instead coming because of shared memory tmfs. The sklearn fit was consuming all the shared memory while training. Solution : setting JOBLIB_TEMP_FOLDER environment variable , to /tmp solved the problem.
恐怕我不能完全确定如何解释或实施这个解决方案

以下是与问题来源相关的三条线:

lda = LatentDirichletAllocation(learning_method='batch', max_iter=100, n_jobs=-1, verbose=1)
gscv = GridSearchCV(lda, tuned_parameters, cv=3, verbose=10, n_jobs=1)
gscv.fit(data)
我会用以下形式来称呼这份工作:

gcloud ai-platform jobs submit training $JOB_NAME \
        --package-path $TRAINER_PACKAGE_PATH \
        --module-name $MAIN_TRAINER_MODULE \
        --job-dir $JOB_DIR \
        --region $REGION \
        --config config.yaml
这是日志中绝对令人讨厌的错误消息:

sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/externals/loky/backend/queues.py", line 150, in _feed obj_ = dumps(obj, reducers=reducers) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/externals/loky/backend/reduction.py", line 243, in dumps dump(obj, buf, reducers=reducers, protocol=protocol) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/externals/loky/backend/reduction.py", line 236, in dump _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py", line 284, in dump return Pickler.dump(self, obj) File "/usr/lib/python3.5/pickle.py", line 408, in dump self.save(obj) File "/usr/lib/python3.5/pickle.py", line 520, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.5/pickle.py", line 623, in save_reduce save(state) File "/usr/lib/python3.5/pickle.py", line 475, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.5/pickle.py", line 810, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems save(v) File "/usr/lib/python3.5/pickle.py", line 520, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.5/pickle.py", line 623, in save_reduce save(state) File "/usr/lib/python3.5/pickle.py", line 475, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.5/pickle.py", line 810, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems save(v) File "/usr/lib/python3.5/pickle.py", line 520, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.5/pickle.py", line 623, in save_reduce save(state) File "/usr/lib/python3.5/pickle.py", line 475, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.5/pickle.py", line 810, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems save(v) File "/usr/lib/python3.5/pickle.py", line 475, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.5/pickle.py", line 770, in save_list self._batch_appends(obj) File "/usr/lib/python3.5/pickle.py", line 797, in _batch_appends save(tmp[0]) File "/usr/lib/python3.5/pickle.py", line 475, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple save(element) File "/usr/lib/python3.5/pickle.py", line 475, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.5/pickle.py", line 740, in save_tuple save(element) File "/usr/lib/python3.5/pickle.py", line 481, in save rv = reduce(obj) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_memmapping_reducer.py", line 339, in __call__ for dumped_filename in dump(a, filename): File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/numpy_pickle.py", line 502, in dump NumpyPickler(f, protocol=protocol).dump(value) File "/usr/lib/python3.5/pickle.py", line 408, in dump self.save(obj) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/numpy_pickle.py", line 289, in save wrapper.write_array(obj, self) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/numpy_pickle.py", line 104, in write_array pickler.file_handle.write(chunk.tostring('C')) OSError: [Errno 28] No space left on device """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.5/site-packages/experiment_trainer/experiment.py", line 87, in <module> gscv.fit(data) File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 722, in fit self._run_search(evaluate_candidates) File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 1191, in _run_search evaluate_candidates(ParameterGrid(self.param_grid)) File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 711, in evaluate_candidates cv.split(X, y, groups))) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 917, in __call__ if self.dispatch_one_batch(iterator): File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 759, in dispatch_one_batch self._dispatch(tasks) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 716, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py", line 182, in apply_async result = ImmediateResult(func) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py", line 549, in __init__ self.results = batch() File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__ for func, args, kwargs in self.items] File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 225, in <listcomp> for func, args, kwargs in self.items] File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_validation.py", line 526, in _fit_and_score estimator.fit(X_train, **fit_params) File "/usr/local/lib/python3.5/dist-packages/sklearn/decomposition/online_lda.py", line 570, in fit batch_update=True, parallel=parallel) File "/usr/local/lib/python3.5/dist-packages/sklearn/decomposition/online_lda.py", line 453, in _em_step parallel=parallel) File "/usr/local/lib/python3.5/dist-packages/sklearn/decomposition/online_lda.py", line 406, in _e_step for idx_slice in gen_even_slices(X.shape[0], n_jobs)) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 930, in __call__ self.retrieve() File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py", line 521, in wrap_future_result return future.result(timeout=timeout) File "/usr/lib/python3.5/concurrent/futures/_base.py", line 398, in result return self.__get_result() File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result raise self._exception _pickle.PicklingError: Could not pickle the task to send it to the workers.
sklearn.externals.joblib.externals.loky.process\u executor.\u RemoteTraceback:“回溯(最近一次调用):文件”/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/externals/loky/backend/queues.py”,第150行,在(feed obj=dumps(obj,reducers=reducers)文件中/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/externals/loky/backend/reduce.py”,第243行,转储文件(obj,buf,reducers=reducers,protocol=protocol)中“/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/externals/loky/backend/reduce.py”,第236行,转储文件中(文件,reducers=reducers,protocol=protocol).dump(obj)文件“/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py”,第284行,在dump return Pickler.dump(self,obj)文件“/usr/lib/python3.5/pickle.py”中,第408行,在dump self.save(obj)文件“/usr/lib/python3“,第520行,在save self.save_reduce(obj=obj,*rv)文件“/usr/lib/python3.5/pickle.py”中,第623行,在save self.save_reduce(state)文件“/usr/lib/python3.5/pickle.py”中,第475行,在save f(self,obj)中,用显式自文件“/usr/lib/python3.5/pickle.pyle.py”调用未绑定方法,在save dict文件中,第810行,在save_dict中“/usr/lib/python3.5/pickle.py”,第836行,在“批处理设置项保存(v)文件”//usr/lib/python3.5/pickle.py”中,第520行,在“保存自存”中,在“保存自减(obj=obj,*rv)文件“/usr/lib/python3.5/pickle.py”中,第623行,在“保存自减保存(状态)文件”//usr/lib/python3.5/pickle.pyle.py”中,在保存f(self,obj)中,第475行,用显式自存方法调用未绑定文件”/usr/lib/python3.5/pickle.py”,第810行,在save_dict self.\u batch_setitems(obj.items())文件“/usr/lib/python3.5/pickle.py”,第841行,在_batch_setitems save(v)文件“/usr/lib/python3.5/pickle.py”,第520行,在save self.save_-reduce(obj=obj,*rv)文件“/usr/lib/python3.5/pickle.pykle.pyle.py”,第623行,在save-reduce(state)文件中/usr/lib/python3.5/pickle.py”,第475行,在save f(self,obj)中#使用显式自文件调用未绑定方法“/usr/lib/python3.5/pickle.py”,第810行,在save_dict self.\u batch\u setitems(obj.items())文件“/usr/lib/python3.5/pickle.py”,第836行,在(ob batch\u setitems save(v)文件“/usr/lib/python3.5/pickle.pykle.py”,第475行,在save f(self,self)中#使用显式自文件“/usr/lib/python3.5/pickle.py”调用未绑定方法,在保存列表self中的第770行。#批处理追加(obj)文件“/usr/lib/python3.5/pickle.py”,在#批处理追加保存(tmp[0])文件“/usr/lib/python3.5/pickle.py”中的第797行,在保存f(self、obj)中的第475行。#使用显式自文件“/usr/lib/python3.5/pickle.5/pickle调用未绑定方法“,第725行,在save_tuple save(element)文件“/usr/lib/python3.5/pickle.py”中,第475行,在save f(self,obj)中,用显式自文件“/usr/lib/python3.5/pickle.py”调用未绑定方法,第740行,在save rv=reduce(obj)文件中,第481行“/usr/local/lib/python3.5/dist packages/sklearn/externals/joblib/_memmapping\u reducer.py”,第339行,转储文件中转储文件名的调用(a,文件名):文件“/usr/local/lib/python3.5/dist packages/sklearn/externals/joblib/numpy\u pickle.py”,第502行,转储文件名为numpickler(f,协议=协议)。转储(值)文件“/usr/lib/python3.5/pickle.py“,第408行,在转储self.save(obj)文件“/usr/local/lib/python3.5/dist packages/sklearn/externals/joblib/numpy_pickle.py”中,第289行,在save wrapper.write_array(obj,self)文件“/usr/local/lib/python3.5/dist packages/sklearn/externals/joblib/numpy_pickle.py”中,第104行,在write_数组pickler.File_handle.write(chunk.tostring('C')'))OSError:[Errno 28]设备“”上没有剩余空间上述异常是以下异常的直接原因:回溯(最近一次调用):文件“/usr/lib/python3.5/runpy.py”,第184行,在_run_模块_as_main“uu main_uu”,mod_spec)文件“/usr/lib/python3.5/runpy.py”,第85行,在u run code exec(code,run_globals)文件中”/root/.local/lib/python3.5/site packages/experience\u trainer/experience.py”,第87行,在gscv.fit(数据)文件“/usr/local/lib/python3.5/dist packages/sklearn/model\u selection/\u search.py”中,第722行,在fit self中。运行搜索(评估候选者)文件“/usr/local/lib/python3.5/dist-packages/sklearn/model\u selection//search.py,第1191行,在“运行搜索评估候选对象(参数网格(self.param_网格))文件/usr/local/lib/python3.5/dist-packages/sklearn/model_-selection/_-search.py”中,第711行,在评估候选对象cv.split(X,y,groups))文件/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py中,第917行,如果self.dispatch\u one\u batch(迭代器):文件“/usr/local/lib/python3.5/dist packages/sklearn/externals/joblib/parallel.py”,第759行,在dispatch\u one\u batch self.dispatch(tasks)文件“/usr/local/lib/python3.5/dist packages/sklearn/externals/joblib/parallel.py”,第716行,在dispatch-job=self.apply中(批处理,回调=cb)