Python scikit.learn cross_val_分数中存在错误_Python_Scikit Learn_Cross Validation

Python scikit.learn cross_val_分数中存在错误

python scikit-learn

Python scikit.learn cross_val_分数中存在错误,python,scikit-learn,cross-validation,Python,Scikit Learn,Cross Validation,请参阅以下地址的笔记本这部分代码 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10) print scores print scores.mean() 在Windows 7 64位计算机中生成以下错误 --------------------------------------------------------------------------- IndexError

请参阅以下地址的笔记本

这部分代码

scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
print scores
print scores.mean()

在Windows 7 64位计算机中生成以下错误

---------------------------------------------------------------------------
 IndexError                                Traceback (most recent call last)
 <ipython-input-37-4a10affe67c7> in <module>()
 1 # evaluate the model using 10-fold cross-validation
 ----> 2 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
  3 print scores
  4 print scores.mean()

 C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in    cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch)
  1140                         allow_nans=True, allow_nd=True)
  1141 
  -> 1142     cv = _check_cv(cv, X, y, classifier=is_classifier(estimator))
  1143     scorer = check_scoring(estimator, score_func=score_func, scoring=scoring)
  1144     # We clone the estimator to make sure that all the folds are

  C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask)
  1366         if classifier:
  1367             if type_of_target(y) in ['binary', 'multiclass']:
  -> 1368                 cv = StratifiedKFold(y, cv, indices=needs_indices)
  1369             else:
  1370                 cv = KFold(_num_samples(y), cv, indices=needs_indices)

  C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state)
  428         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
  429             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 430                 label_test_folds = test_folds[y == label]
 431                 # the test split can be too big because we used
 432                 # KFold(max(c, self.n_folds), self.n_folds) instead of

IndexError: too many indices for array

================更新2=============

似乎由于某些软件包更新，我无法再在我的机器上重现此类错误。如果您在windows 7 64位计算机上遇到相同的问题，请告诉我。

当我发现此问题时，我遇到了与您相同的错误并正在寻找答案

我使用了相同的sklearn.cross_validation.cross_val_分数（不同的算法除外）和相同的机器windows 7，64位

我从上面尝试了您的解决方案，它“起作用”，但它给了我以下警告：

C:\Users\E245713\AppData\Local\Continuum\Anaconda3\lib\site packages\sklearn\cross\u validation.py:1531:DataConversionWarning:在需要1d数组时传递了列向量y。请将y的形状更改为（n_samples，），例如使用ravel（）。估计值拟合（X_序列、y_序列、**拟合参数）

在阅读了警告之后，我发现问题与“y”（我的标签列）的形状有关。要从警告中尝试的关键字是“ravel（）”。因此，我尝试了以下方法：

y_arr = pd.DataFrame.as_matrix(label)
print(y_arr)
print(y_arr.shape())

这给了我

  [[1]
   [0]
   [1]
   .., 
   [0]
   [0]
   [1]]

  (87939, 1)

当我添加“ravel（）”时：

它给了我：

[1 0 1 ..., 0 0 1]

(87939,)

“y_arr”的尺寸必须为（87939，）的形式，而不是（87939,1）。在那之后，我最初的cross_val_分数没有添加Kfold代码

希望这有帮助。

我知道答案晚了。
但这个答案可能会帮助其他人克服同样的错误。我对Python3.6也有同样的问题从3.6更改为3.5后，我可以使用该功能。
下面是我运行的示例：

accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1)

首先使用3.5版本创建conda env

conda create -n py35 python=3.5  
source activate py35

希望这将有助于向前推进

导入此模块，它应该可以工作：

from sklearn.model_selection import cross_val_score

的形状是什么？有效和无效的唯一区别是

cv

X.shape[0]==6366

也？@Eikenberg

cv=10

将尝试进行分层10倍cv，

KFold

将不会。如果所有其他条件都相同，则明确放置

cv=StratifiedKFold（y，10）

将是我的下一个诊断步骤。这是您唯一做的更改吗？因为如果这样做有效，那么cv=number也应该（参见@larsmans comment）错误消息显示，这不是错误，因为它可以处理方法，但不能处理其中提供的数组。

conda create -n py35 python=3.5  
source activate py35

from sklearn.model_selection import cross_val_score