Python Sklearn:不允许负维度
我使用sklearnPython Sklearn:不允许负维度,python,numpy,machine-learning,scipy,scikit-learn,Python,Numpy,Machine Learning,Scipy,Scikit Learn,我使用sklearnnearestneights包对数据集进行分类。在我尝试在KNN预测中使用“距离”权重之前,它工作得很好。当我从的“统一”权重切换到的“距离”权重时,出现了一个错误,即不允许负维度。“统一的”砝码工作正常 错误消息如下所示: /home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/regression.py:160: RuntimeWarning: invalid value encountered i
nearestneights
包对数据集进行分类。在我尝试在KNN预测中使用“距离”
权重之前,它工作得很好。当我从的“统一”权重切换到的“距离”权重时,出现了一个错误,即不允许负维度。“统一的”
砝码工作正常
错误消息如下所示:
/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/regression.py:160: RuntimeWarning: invalid value encountered in divide
y_pred[:, j] = num / denom
Traceback (most recent call last):
File "analysis.py", line 333, in <module>
main()
File "analysis.py", line 330, in main
ind_test_labels, trainIDs, ind_test_IDs, train_data_original, ind_test_data_original)
File "analysis.py", line 297, in target1
outfile = generate_result(X, feature_names, train_label, outfile, trainIDs, train_labels, best_k, train_data_original, ind_test_data_original)
File "analysis.py", line 130, in generate_result
predicted_label = regressor.predict(test)
File "/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/regression.py", line 144, in predict
neigh_dist, neigh_ind = self.kneighbors(X)
File "/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/base.py", line 332, in kneighbors
return_distance=return_distance)
File "binary_tree.pxi", line 1313, in sklearn.neighbors.kd_tree.BinaryTree.query (sklearn/neighbors/kd_tree.c:10528)
File "binary_tree.pxi", line 595, in sklearn.neighbors.kd_tree.NeighborsHeap.__init__ (sklearn/neighbors/kd_tree.c:4937)
ValueError: negative dimensions are not allowed
train_data = np.loadtext(...)
train_data = preprocessing.scale(train_data);
X_T = train_data.T
X = X_T[[features]].T # features is a tuple that contains columns to be selected for classification
# Then X is passed to generate_result below
#######################################
def generate_result(X, feature_names, train_label, outfile, IDs, labels, k, train_original, ind_test_original):
"""
Purpose: this function does the analysis and outputs the result to file
Inputs: training set, names of selected features, training set labels, file writer stream, IDs of training set,
labels of training set, number of neighbors, original training set, independent test set
Returns: file writer stream
"""
cv = cross_validation.KFold(len(X), 10) # 10-fold cross-validation
feature_str = ','.join(feature_names)
outfile.write('Best K = ' + str(k) + '\n')
outfile.write('10-Fold Cross Validation begins \n')
numCV = 1 #predicted_GFR_str = array_to_string(predicted_label)
for traincv, testcv in cv:
outfile.write('Iteration: ' + str(numCV) + '\n')
outfile.write(complete_features + ',label' + str(numCV) + ',Catagory' + str(numCV) + '\n')
train = X[traincv]
test = X[testcv]
### run regression
regressor = KNeighborsRegressor(n_neighbors = k, weights = 'distance', p = 1)
label_cv_train = train_label[traincv]
regressor.fit(train, label_cv_train)
test = X[testcv]
label_cv_test = train_label[testcv]
predicted_label = regressor.predict(test)# THIS LINE IS CAUSING THE PROBLEM
# more code below not pasted
尝试在谷歌上搜索“ValueError:不允许负维度”,您将看到在许多不同的情况下都会出现错误,包括scikit learn、scipy.sparse、pandas。。。要缩小范围,我们确实需要查看您的实际代码。你能试着发布一个复制错误的最小示例吗?@ruyan,你能在随机生成的数据上复制错误吗(例如,使用numpy.random.randn(n_样本,n_功能)
或使用sklearn.datasets
)中的一个数据集生成器?你使用的是哪个版本的scikit learn?您可以在主分支上复制它吗?最后,您标记为“此行导致问题”的行不会导致回溯,因为它不会调用原始回溯中报告的predict
方法(要调用predict
,您需要先调用fit
,这在本代码段中也不是这样)。您可以使用合成数据发布一个简单的示例吗?如果没有额外的代码,就无法运行您发布的代码。