Python Sklearn:不允许负维度

Python Sklearn:不允许负维度,python,numpy,machine-learning,scipy,scikit-learn,Python,Numpy,Machine Learning,Scipy,Scikit Learn,我使用sklearnnearestneights包对数据集进行分类。在我尝试在KNN预测中使用“距离”权重之前,它工作得很好。当我从的“统一”权重切换到的“距离”权重时,出现了一个错误,即不允许负维度。“统一的”砝码工作正常 错误消息如下所示: /home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/regression.py:160: RuntimeWarning: invalid value encountered i

我使用sklearn
nearestneights
包对数据集进行分类。在我尝试在KNN预测中使用
“距离”
权重之前,它工作得很好。当我从
的“统一”权重切换到
的“距离”权重时,出现了一个错误,即不允许
负维度。
“统一的”
砝码工作正常

错误消息如下所示:

/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/regression.py:160: RuntimeWarning: invalid value encountered in divide
  y_pred[:, j] = num / denom
Traceback (most recent call last):
  File "analysis.py", line 333, in <module>
    main()
  File "analysis.py", line 330, in main
    ind_test_labels, trainIDs, ind_test_IDs, train_data_original, ind_test_data_original)
  File "analysis.py", line 297, in target1
    outfile = generate_result(X, feature_names, train_label, outfile, trainIDs, train_labels, best_k, train_data_original, ind_test_data_original)
  File "analysis.py", line 130, in generate_result
    predicted_label = regressor.predict(test)
  File "/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/regression.py", line 144, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/base.py", line 332, in kneighbors
    return_distance=return_distance)
  File "binary_tree.pxi", line 1313, in sklearn.neighbors.kd_tree.BinaryTree.query (sklearn/neighbors/kd_tree.c:10528)
  File "binary_tree.pxi", line 595, in sklearn.neighbors.kd_tree.NeighborsHeap.__init__ (sklearn/neighbors/kd_tree.c:4937)
ValueError: negative dimensions are not allowed
train_data = np.loadtext(...)
train_data = preprocessing.scale(train_data);
X_T = train_data.T
X = X_T[[features]].T # features is a tuple that contains columns to be selected for classification
# Then X is passed to generate_result below
#######################################
def generate_result(X, feature_names, train_label, outfile, IDs, labels, k, train_original, ind_test_original):
  """
  Purpose: this function does the analysis and outputs the result to file
  Inputs: training set, names of selected features, training set labels, file writer stream, IDs of training set,
          labels of training set, number of neighbors, original training set, independent test set 
  Returns: file writer stream
  """
  cv = cross_validation.KFold(len(X), 10) # 10-fold cross-validation
  feature_str = ','.join(feature_names)
  outfile.write('Best K = ' + str(k) + '\n')
  outfile.write('10-Fold Cross Validation begins \n')
  numCV = 1 #predicted_GFR_str = array_to_string(predicted_label)
  for traincv, testcv in cv:
    outfile.write('Iteration: ' + str(numCV) + '\n')
    outfile.write(complete_features + ',label' + str(numCV) + ',Catagory' + str(numCV) + '\n')
    train = X[traincv]
    test = X[testcv]
    ### run regression
    regressor = KNeighborsRegressor(n_neighbors = k, weights = 'distance', p = 1)       

    label_cv_train = train_label[traincv]
    regressor.fit(train, label_cv_train)
    test = X[testcv]
    label_cv_test = train_label[testcv]
    predicted_label = regressor.predict(test)# THIS LINE IS CAUSING THE PROBLEM


    # more code below not pasted

尝试在谷歌上搜索“ValueError:不允许负维度”,您将看到在许多不同的情况下都会出现错误,包括scikit learn、scipy.sparse、pandas。。。要缩小范围,我们确实需要查看您的实际代码。你能试着发布一个复制错误的最小示例吗?@ruyan,你能在随机生成的数据上复制错误吗(例如,使用
numpy.random.randn(n_样本,n_功能)
或使用
sklearn.datasets
)中的一个数据集生成器?你使用的是哪个版本的scikit learn?您可以在主分支上复制它吗?最后,您标记为“此行导致问题”的行不会导致回溯,因为它不会调用原始回溯中报告的
predict
方法(要调用
predict
,您需要先调用
fit
,这在本代码段中也不是这样)。您可以使用合成数据发布一个简单的示例吗?如果没有额外的代码,就无法运行您发布的代码。