Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 尝试在scikit learn中并行化参数搜索会导致;SystemError:PyObject调用中无错误的空结果;_Python_Scikit Learn - Fatal编程技术网

Python 尝试在scikit learn中并行化参数搜索会导致;SystemError:PyObject调用中无错误的空结果;

Python 尝试在scikit learn中并行化参数搜索会导致;SystemError:PyObject调用中无错误的空结果;,python,scikit-learn,Python,Scikit Learn,我正在使用scikit learn 14.1中的sklearn.grid_search.RandomizedSearchCV类,运行以下代码时出错: X, y = load_svmlight_file(inputfile) min_max_scaler = preprocessing.MinMaxScaler() X_scaled = min_max_scaler.fit_transform(X.toarray()) parameters = {'kernel':'rbf', 'C':sci

我正在使用scikit learn 14.1中的sklearn.grid_search.RandomizedSearchCV类,运行以下代码时出错:

X, y = load_svmlight_file(inputfile)

min_max_scaler = preprocessing.MinMaxScaler()
X_scaled = min_max_scaler.fit_transform(X.toarray())

parameters = {'kernel':'rbf', 'C':scipy.stats.expon(scale=100), 'gamma':scipy.stats.expon(scale=.1)}

svr = svm.SVC()

classifier = grid_search.RandomizedSearchCV(svr, parameters, n_jobs=8)
classifier.fit(X_scaled, y)
当我将n_jobs参数设置为大于1时,会得到以下错误输出:

Traceback (most recent call last):
  File "./svm_training.py", line 185, in <module>
    main(sys.argv[1:])
  File "./svm_training.py", line 63, in main
    gridsearch(inputfile, kerneltype, parameterfile)
  File "./svm_training.py", line 85, in gridsearch
    classifier.fit(X_scaled, y)
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-    x86_64.egg/sklearn/grid_search.py", line 860, in fit
    return self._fit(X, y, sampled_params)
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/grid_search.py", line 493, in _fit
    for parameters in parameter_iterable
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py", line 519, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py", line 419, in retrieve
    self._output.append(job.get())
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
SystemError: NULL result without error in PyObject_Call
回溯(最近一次呼叫最后一次):
文件“/svm_training.py”,第185行,在
main(sys.argv[1:])
文件“/svm_training.py”,第63行,主目录
gridsearch(inputfile、kerneltype、parameterfile)
文件“/svm_training.py”,第85行,在gridsearch中
分类器。拟合(X_标度,y)
文件“/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/grid_search.py”,第860行
返回自拟合(X、y、采样参数)
文件“/usr/local/lib/python2.7/dist packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/grid_search.py”,第493行
对于参数_iterable中的参数
文件“/usr/local/lib/python2.7/dist packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py”,第519行,在调用中__
self.retrieve()
文件“/usr/local/lib/python2.7/dist-packages/scikit_-learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py”,第419行,在检索中
self.\u output.append(job.get())
get中的文件“/usr/lib/python2.7/multiprocessing/pool.py”,第558行
提升自我价值
SystemError:PyObject_调用中无错误的空结果

这似乎与python的多处理功能有关,但我不确定如何解决它,而只是手工实现参数搜索的并行化。有没有人在尝试并行化随机参数搜索时遇到过类似的问题,因为他们能够解决这些问题?

事实证明,问题在于使用MinMaxScaler。因为MinMaxScaler只接受密集数组,所以我在缩放之前将特征向量的稀疏表示转换为密集数组。由于特征向量有数千个元素,我的假设是密集数组在尝试并行化参数搜索时导致了内存错误。取而代之的是,我选择了StandardScaler,它接受稀疏数组作为输入,并且应该更好地用于我的问题空间。

请发布一个可以用来重现问题的。