Python Scikit学习网格搜索给予“;ValueError:不支持多类格式";错误
我尝试使用GridSearch对LinearSVC()进行参数估计,如下所示-Python Scikit学习网格搜索给予“;ValueError:不支持多类格式";错误,python,machine-learning,scikit-learn,Python,Machine Learning,Scikit Learn,我尝试使用GridSearch对LinearSVC()进行参数估计,如下所示- clf_SVM = LinearSVC() params = { 'C': [0.5, 1.0, 1.5], 'tol': [1e-3, 1e-4, 1e-5], 'multi_class': ['ovr', 'crammer_singer'], } gs = GridSearchCV(clf_SVM, params, cv=5, sco
clf_SVM = LinearSVC()
params = {
'C': [0.5, 1.0, 1.5],
'tol': [1e-3, 1e-4, 1e-5],
'multi_class': ['ovr', 'crammer_singer'],
}
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
gs.fit(corpus1, y)
小体1有形状(17267001),y有形状(1726,)
这是一个多类分类,y的值从0到3,两者都包括在内,即有四个类
但这给了我以下的错误-
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-220-0c627bda0543> in <module>()
5 }
6 gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
----> 7 gs.fit(corpus1, y)
/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in fit(self, X, y)
594
595 """
--> 596 return self._fit(X, y, ParameterGrid(self.param_grid))
597
598
/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in _fit(self, X, y, parameter_iterable)
376 train, test, self.verbose, parameters,
377 self.fit_params, return_parameters=True)
--> 378 for parameters in parameter_iterable
379 for train, test in cv)
380
/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
651 self._iterating = True
652 for function, args, kwargs in iterable:
--> 653 self.dispatch(function, args, kwargs)
654
655 if pre_dispatch == "all" or n_jobs == 1:
/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs)
398 """
399 if self._pool is None:
--> 400 job = ImmediateApply(func, args, kwargs)
401 index = len(self._jobs)
402 if not _verbosity_filter(index, self.verbose):
/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs)
136 # Don't delay the application, to avoid keeping the input
137 # arguments in memory
--> 138 self.results = func(*args, **kwargs)
139
140 def get(self):
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters)
1238 else:
1239 estimator.fit(X_train, y_train, **fit_params)
-> 1240 test_score = _score(estimator, X_test, y_test, scorer)
1241 if return_train_score:
1242 train_score = _score(estimator, X_train, y_train, scorer)
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in _score(estimator, X_test, y_test, scorer)
1294 score = scorer(estimator, X_test)
1295 else:
-> 1296 score = scorer(estimator, X_test, y_test)
1297 if not isinstance(score, numbers.Number):
1298 raise ValueError("scoring must return a number, got %s (%s) instead."
/usr/local/lib/python2.7/dist-packages/sklearn/metrics/scorer.pyc in __call__(self, clf, X, y)
136 y_type = type_of_target(y)
137 if y_type not in ("binary", "multilabel-indicator"):
--> 138 raise ValueError("{0} format is not supported".format(y_type))
139
140 try:
ValueError: multiclass format is not supported
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
5 }
6 gs=GridSearchCV(clf_SVM,参数,cv=5,评分='roc_auc')
---->7 gs.fit(小体1,y)
/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in-fit(self,X,y)
594
595 """
-->596返回自拟合(X,y,参数网格(自参数网格))
597
598
/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in_fit(self,X,y,parameter_iterable)
376列车、试验、自详细、参数、,
377 self.fit_参数,返回_参数=真)
-->378用于参数_iterable中的参数
379用于列车,在cv中进行试验)
380
/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in_______调用(self,iterable)
651自迭代=真
652对于iterable中的函数、参数和kwargs:
-->653自动调度(功能、参数、kwargs)
654
655如果预调度==“所有”或n个作业==1:
/分派中的usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc(self、func、args、kwargs)
398 """
399如果self.\u池为无:
-->400作业=立即应用(func、args、kwargs)
401索引=len(自作业)
402如果不是详细过滤器(索引,self.verbose):
/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in_u_________(self、func、args、kwargs)
136#不要延迟应用程序,以免保留输入
137#内存中的参数
-->138 self.results=func(*args,**kwargs)
139
140 def get(自我):
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in_fit_和_分数(估计器、X、y、计分器、训练、测试、详细、参数、拟合参数、返回训练分数、返回参数)
1238其他:
1239估算器拟合(X_序列、y_序列、**拟合参数)
->1240测试分数=_分数(估计员、X测试、y测试、计分员)
1241如果返回列车评分:
1242训练分数=_分数(估计员、X训练、y训练、计分员)
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in_分数(估计器、X_测试、y_测试、计分器)
1294分=记分员(估计员,X_检验)
1295其他:
->1296分=记分员(估计员、X_检验、y_检验)
1297如果不存在(分数、数字、数字):
1298 raise VALUERROR(“评分必须返回一个数字,取而代之的是%s(%s)。”
/usr/local/lib/python2.7/dist-packages/sklearn/metrics/scorer.pyc in___调用(self,clf,X,y)
136 y_类型=_目标的类型(y)
137如果y_类型不在(“二进制”、“多标签指示器”):
-->138 raise VALUERROR(“{0}格式不受支持”。格式(y_类型))
139
140试试:
ValueError:不支持多类格式
来自:
“注意:此实现仅限于标签指示器格式的二进制分类任务或多标签分类任务。”
尝试:
训练前。这将对您的y执行“一次热”编码。如前所述,您必须首先对y进行二值化
y = label_binarize(y, classes=[0, 1, 2, 3])
然后使用多类学习算法,如OneVsRestClassifier
或OneVsOneClassifier
。例如:
clf_SVM = OneVsRestClassifier(LinearSVC())
params = {
'estimator__C': [0.5, 1.0, 1.5],
'estimator__tol': [1e-3, 1e-4, 1e-5],
}
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
gs.fit(corpus1, y)
删除scoring='roc\u auc'
,它将起到roc\u auc
曲线不支持分类数据的作用。您可以直接使用进行分类
而不是预处理。label\u binarize()
取决于您的问题。问题实际上来自于使用计分=roc\u auc
。请注意,roc\u auc
不支持分类数据。您可以打印中使用的变量的形状吗。fitcorpus1具有形状(17267001)而y具有形状(1726,)我也有同样的问题,在使用“roc_auc”评分机制时,我使用了“精确性”并成功了。谢谢,但现在我检查了y和小体1的形状,它们是(1726,4)和(1726,7001)你的形状现在是(1380,4)?转换后的y应该是(1726,4)您的y变量中是否存在所有4个类?是的,请参见此处的前30行-@user1269942我没有看到此“注意:此实现仅限于标签指示器格式的二进制分类任务或多标签分类任务”。关于此问题。您能解释一下我应该在哪里查找吗?
clf_SVM = OneVsRestClassifier(LinearSVC())
params = {
'estimator__C': [0.5, 1.0, 1.5],
'estimator__tol': [1e-3, 1e-4, 1e-5],
}
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
gs.fit(corpus1, y)