Scikit learn scikit学习中预测时的记忆错误

Scikit learn scikit学习中预测时的记忆错误,scikit-learn,Scikit Learn,下面是我编写的一段代码,使用RFE和估计器LinearSVC获得特征选择,然后使用简化的数据拟合和预测KNeighborClassifier clf = LinearSVC(C = 10, class_weight = 'auto') rfe = RFE(estimator = clf, n_features_to_select = 700, step = 42) rfe.fit(X, trainLabels) reduced_train_data = rfe.t

下面是我编写的一段代码,使用RFE和估计器LinearSVC获得特征选择,然后使用简化的数据拟合和预测KNeighborClassifier

    clf = LinearSVC(C = 10, class_weight = 'auto')
    rfe = RFE(estimator = clf, n_features_to_select = 700, step = 42)
    rfe.fit(X, trainLabels)
    reduced_train_data = rfe.transform(X)
    print "reduced_train_data.shape ", reduced_train_data.shape
    reduced_test_data = rfe.transform(test)
    neigh = KNeighborsClassifier(n_neighbors=5, weights='distance', algorithm = 'ball_tree')
    print "knn initiated"
    neigh.fit(reduced_train_data, trainLabels)
    print "knn fitted"
    test_predict = neigh.predict(reduced_test_data)
    print "knn predicted"
输出结果如下: 简化的列车数据形状(42000700) knn发起 knn已安装

然后我看到以下错误:

Traceback (most recent call last):
  File "E:\Coursera\KaggleDataProjects\DigitRecognition\main.py", line 74, in <module>
    test_predict = neigh.predict(reduced_test_data)
  File "C:\Python27\lib\site-packages\sklearn\neighbors\classification.py", line 146, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "C:\Python27\lib\site-packages\sklearn\neighbors\base.py", line 313, in kneighbors
    return_distance=return_distance)
  File "binary_tree.pxi", line 1295, in sklearn.neighbors.ball_tree.BinaryTree.query (sklearn\neighbors\ball_tree.c:9889)
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 91, in array2d
    X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
  File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray
    return array(a, dtype, copy=False, order=order)
MemoryError
回溯(最近一次呼叫最后一次):
文件“E:\Coursera\KaggleDapProjects\DigitRecognition\main.py”,第74行,在
测试预测=无预测(减少测试数据)
文件“C:\Python27\lib\site packages\sklearn\neights\classification.py”,第146行,在predict中
neigh_dist,neigh_ind=self.kneighbors(X)
文件“C:\Python27\lib\site packages\sklearn\neighbors\base.py”,第313行,在kneighbors中
返回距离=返回距离)
文件“binary_tree.pxi”,第1295行,位于sklearn.neights.ball_tree.BinaryTree.query(sklearn\neights\ball_tree.c:9889)中
array2d中第91行的文件“C:\Python27\lib\site packages\sklearn\utils\validation.py”
X_2d=np.asarray(至少np.2d(X),dtype=dtype,order=order)
asarray中的文件“C:\Python27\lib\site packages\numpy\core\numeric.py”,第320行
返回数组(a,数据类型,copy=False,order=order)
记忆者
并非每次我通过稍微更改参数来运行代码时都会发生此错误。有人能解释一下需要做些什么来解决这个问题吗

列车数据的初始尺寸(X)=42000784
测试数据的初始维度(test)=28000784

您使用的是哪个版本的scikit learn?ball树类已在0.14中重写。如果您使用的是最新版本,那么这可能是一个bug。请随时向我报告,我使用的是0.14.1版本。在这种情况下,RFE不是一个好主意。它在小范围内运行良好,但会占用大量内存。在该范围内,哪些算法适用于特征选择?我将在数据集的样本上进行尝试。