Python 使用Sklearn排除交叉验证

Python 使用Sklearn排除交叉验证,python,numpy,scikit-learn,cross-validation,Python,Numpy,Scikit Learn,Cross Validation,我试图使用交叉验证来测试我的分类器使用Sklearn 我有3个班,总共50个样本 1类有:5个样本 第2类有:15个样本 第3类有:30个样本 以下按预期运行,这可能会进行5倍交叉验证 result = cross_validation.cross_val_score(classifier, X, y, cv=5) 我试着用cv=50倍来省去一次,所以我做了以下工作 result = cross_validation.cross_val_score(classifier, X, y, cv

我试图使用交叉验证来测试我的分类器使用Sklearn

我有3个班,总共50个样本

  • 1类有:5个样本
  • 第2类有:15个样本
  • 第3类有:30个样本
以下按预期运行,这可能会进行5倍交叉验证

result = cross_validation.cross_val_score(classifier, X, y, cv=5)
我试着用cv=50倍来省去一次,所以我做了以下工作

result = cross_validation.cross_val_score(classifier, X, y, cv=50)
然而,令人惊讶的是,它给出了以下错误:

/Library/Python/2.7/site-packages/sklearn/cross_validation.py:413: Warning: The least populated class in y has only 5 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=50.
  % (min_labels, self.n_folds)), Warning)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/_methods.py:67: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "b.py", line 96, in <module>
    scores1 = cross_validation.cross_val_score(classifier, X, y, cv=50)
  File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1151, in cross_val_score
    for train, test in cv)
  File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 653, in __call__
    self.dispatch(function, args, kwargs)
  File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 400, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 138, in __init__
    self.results = func(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1240, in _fit_and_score
    test_score = _score(estimator, X_test, y_test, scorer)
  File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1296, in _score
    score = scorer(estimator, X_test, y_test)
  File "/Library/Python/2.7/site-packages/sklearn/metrics/scorer.py", line 176, in _passthrough_scorer
    return estimator.score(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/sklearn/base.py", line 291, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/Library/Python/2.7/site-packages/sklearn/neighbors/classification.py", line 147, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "/Library/Python/2.7/site-packages/sklearn/neighbors/base.py", line 332, in kneighbors
    return_distance=return_distance)
  File "binary_tree.pxi", line 1307, in sklearn.neighbors.kd_tree.BinaryTree.query (sklearn/neighbors/kd_tree.c:10506)
  File "binary_tree.pxi", line 226, in sklearn.neighbors.kd_tree.get_memview_DTYPE_2D (sklearn/neighbors/kd_tree.c:2715)
  File "stringsource", line 247, in View.MemoryView.array_cwrapper (sklearn/neighbors/kd_tree.c:24789)
  File "stringsource", line 147, in View.MemoryView.array.__cinit__ (sklearn/neighbors/kd_tree.c:23664)
ValueError: Invalid shape in axis 0: 0.
/Library/Python/2.7/site packages/sklearn/cross\u validation.py:413:警告:y中填充最少的类只有5个成员,这太少了。任何类别的最小标签数不得小于n_folds=50。
%(最小标签,自折叠),警告)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/Python/numpy/core/_methods.py:55:RuntimeWarning:Mean of empty slice。
warning.warn(“空片的平均值”,RuntimeWarning)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/Python/numpy/core/_-methods.py:67:运行时警告:在双_标量中遇到无效值
ret=ret.dtype.type(ret/rcount)
回溯(最近一次呼叫最后一次):
文件“b.py”,第96行,在
分数1=交叉验证。交叉验证分数(分类器,X,y,cv=50)
文件“/Library/Python/2.7/site packages/sklearn/cross_validation.py”,第1151行,在cross_val_分数中
对于列车,在cv中进行试验)
文件“/Library/Python/2.7/site packages/sklearn/externals/joblib/parallel.py”,第653行,在调用中__
自我分派(功能、参数、kwargs)
文件“/Library/Python/2.7/site packages/sklearn/externals/joblib/parallel.py”,第400行,在分派中
作业=立即应用(func、args、kwargs)
文件“/Library/Python/2.7/site packages/sklearn/externals/joblib/parallel.py”,第138行,在__
self.results=func(*args,**kwargs)
文件“/Library/Python/2.7/site packages/sklearn/cross_validation.py”,第1240行,in_fit_和_score
测试分数=_分数(估计员、X检验、y检验、计分员)
文件“/Library/Python/2.7/site packages/sklearn/cross_validation.py”,第1296行,in_score
分数=记分员(估计员、X_检验、y_检验)
文件“/Library/Python/2.7/site packages/sklearn/metrics/scorer.py”,第176行,在“passthrough”scorer中
返回估计值得分(*args,**kwargs)
文件“/Library/Python/2.7/site packages/sklearn/base.py”,第291行,在score中
返回准确度得分(y,自我预测(X),样本权重=样本权重)
文件“/Library/Python/2.7/site-packages/sklearn/neights/classification.py”,第147行,在predict中
neigh_dist,neigh_ind=self.kneighbors(X)
文件“/Library/Python/2.7/site packages/sklearn/neighbors/base.py”,第332行,在kneighbors中
返回距离=返回距离)
文件“binary_tree.pxi”,第1307行,位于sklearn.neights.kd_tree.BinaryTree.query(sklearn/neights/kd_tree.c:10506)中
文件“binary_tree.pxi”,第226行,在sklearn.neights.kd_tree.get_memview_DTYPE_2D(sklearn/neights/kd_tree.c:2715)中
文件“stringsource”,第247行,在View.MemoryView.array_cwrapper中(sklearn/neights/kd_tree.c:24789)
文件“stringsource”,第147行,在View.MemoryView.array.\uuuu-cinit\uuu中(sklearn/neights/kd\u-tree.c:23664)
ValueError:轴0中的形状无效:0。
另外,另一件奇怪的事情是,当我做cv=5时,我没有收到任何警告。当我做cv=50时,我得到了上面的警告,这很奇怪。因为我认为当cv变大时,即使计算起来会更困难,结果也应该更准确。我的推理有什么分歧吗?为什么我会得到警告和错误


在这种情况下,我怎样才能正确地将交叉验证排除在外?

默认情况下,分类的cv=5进行分层5倍交叉验证。 这意味着它试图保持一类样本的分数不变。当折叠数与样本数相同时,这可能会导致问题。 你是哪个版本的? 这个错误消息肯定不是很有帮助

顺便说一句,一般来说,我建议您对这样一个小的数据集使用
StratifiedShuffleSplit

[编辑]:当前版本给出警告,可能是错误:

sklearn/cross_validation.py:399:警告:y中填充最少的类只有13个成员,这太少了。任何类别的最小标签数不得小于n_folds=68。 %(最小标签,自折叠),警告)


上面写着:版本:0.15.2。实际上,我一开始并不打算使用分层交叉验证。我只想在交叉验证中去掉一个。然后你必须通过
cv=KFold(5)
。文档中说它默认为分层分类: