Machine learning 值错误:在训练折叠中只有2个class/es,但在整个数据集中只有1个。不平衡折叠的decision_函数不支持这一点
我正在学习机器学习,并在#mnist数据集上创建我的第一个模型 有人能帮我吗?我尝试了分层折叠、kfold和其他方法来解决这个问题 Pandas版本“0.25.1”,Python版本3.7,使用Anaconda发行版Machine learning 值错误:在训练折叠中只有2个class/es,但在整个数据集中只有1个。不平衡折叠的decision_函数不支持这一点,machine-learning,mnist,sgd,Machine Learning,Mnist,Sgd,我正在学习机器学习,并在#mnist数据集上创建我的第一个模型 有人能帮我吗?我尝试了分层折叠、kfold和其他方法来解决这个问题 Pandas版本“0.25.1”,Python版本3.7,使用Anaconda发行版 from sklearn.model_selection import train_test_split train_set ,test_set = train_test_split(mnist,test_size = 0.2, random_state = 29) from s
from sklearn.model_selection import train_test_split
train_set ,test_set = train_test_split(mnist,test_size = 0.2, random_state = 29)
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=29)
sgd_clf.fit(X_train,y_train_5)
X_train, y_train = train_set.drop('label',axis = 1), train_set[['label']]
X_test, y_test = test_set.drop('label',axis = 1),test_set[['label']]
y_train_5 = (y_train == 5) #True for all 5's and false otherwise
y_test_5 = (y_train == 5)
from sklearn.model_selection import cross_val_predict
print(X_train.shape)
print(y_train_5.shape)
cross_val_predict(sgd_clf, X_train, y_train_5, cv=3, method="decision_function")
代码块的最后一行给出了一个错误:
RuntimeWarning: Number of classes in training fold (2) does not match total number of classes (1). Results may not be appropriate for your use case. To fix this, use a cross-validation technique resulting in properly stratified folds
RuntimeWarning)
ValueError Traceback (most recent call last)
<ipython-input-39-da1ad024473a> in <module>
3 print(X_train.shape)
4 print(y_train_5.shape)
----> 5 cross_val_predict(sgd_clf, X_train, y_train_5, cv=3, method="decision_function")
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
787 prediction_blocks = parallel(delayed(_fit_and_predict)(
788 clone(estimator), X, y, train, test, verbose, fit_params, method)
--> 789 for train, test in cv.split(X, y, groups))
790
791 # Concatenate the predictions
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
919 # remaining jobs.
920 self._iterating = False
--> 921 if self.dispatch_one_batch(iterator):
922 self._iterating = self._original_iterator is not None
923
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _fit_and_predict(estimator, X, y, train, test, verbose, fit_params, method)
887 n_classes = len(set(y)) if y.ndim == 1 else y.shape[1]
888 predictions = _enforce_prediction_order(
--> 889 estimator.classes_, predictions, n_classes, method)
890 return predictions, test
891
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _enforce_prediction_order(classes, predictions, n_classes, method)
933 'is not supported for decision_function '
934 'with imbalanced folds. {}'.format(
--> 935 len(classes), n_classes, recommendation))
936
937 float_min = np.finfo(predictions.dtype).min
ValueError: Only 2 class/es in training fold, but 1 in overall dataset. This is not supported for decision_function with imbalanced folds. To fix this, use a cross-validation technique resulting in properly stratified folds
RuntimeWarning:训练折叠中的类数(2)与类总数(1)不匹配。结果可能不适合您的用例。要解决此问题,请使用交叉验证技术,以产生适当分层的褶皱
运行时警告)
ValueError回溯(最近一次调用上次)
在里面
3个打印(X_系列形状)
4个打印(y_列_5.形状)
---->5交叉值预测(sgd\u clf,X\u序列,y\u序列,cv=3,方法=“决策函数”)
交叉值预测中的~\AppData\Local\Continuum\anaconda3\lib\site packages\sklearn\model\u selection\\u validation.py(估计器、X、y、组、cv、n作业、冗余、拟合参数、预调度、方法)
787预测块=并行(延迟(拟合和预测)(
788克隆(估计器),X,y,训练,测试,详细,拟合参数,方法)
-->789列车,等速分段试验(X、y、组))
790
791#连接预测
调用中的~\AppData\Local\Continuum\anaconda3\lib\site packages\joblb\parallel.py(self,iterable)
919#剩余工作。
920自迭代=错误
-->921如果自行调度一批(迭代器):
922 self.\u iterating=self.\u original\u iterator不是None
923
~\AppData\Local\Continuum\anaconda3\lib\site packages\joblib\parallel.py在dispatch\u one\u批处理中(self,迭代器)
757返回错误
758其他:
-->759自我派遣(任务)
760返回真值
761
~\AppData\Local\Continuum\anaconda3\lib\site packages\joblib\parallel.py in\u dispatch(self,batch)
714带自锁:
715作业idx=len(自作业)
-->716作业=self.\u后端.apply\u异步(批处理,回调=cb)
717#一个作业完成的速度比它的回调速度要快
718#在我们到达这里之前打电话,导致self.#你的工作
异步应用中的~\AppData\Local\Continuum\anaconda3\lib\site packages\joblb\\u parallel\u backends.py(self、func、callback)
180 def apply_async(self、func、callback=None):
181“计划要运行的func”
-->182结果=立即结果(func)
183如果回调:
184回调(结果)
~\AppData\Local\Continuum\anaconda3\lib\site packages\joblib\\u parallel\u backends.py in\uuuuuuu init\uuuuu(self,batch)
547#不要延迟应用程序,以免保留输入
548#内存中的参数
-->549 self.results=batch()
550
551 def get(自我):
~\AppData\Local\Continuum\anaconda3\lib\site packages\joblib\parallel.py in\uuuu调用(self)
223具有并行_后端(self._后端,n_作业=self._n_作业):
224返回[func(*args,**kwargs)
-->225用于自身项目中的func、ARG、kwargs]
226
227定义长度(自):
~\AppData\Local\Continuum\anaconda3\lib\site packages\joblib\parallel.py in(.0)
223具有并行_后端(self._后端,n_作业=self._n_作业):
224返回[func(*args,**kwargs)
-->225用于自身项目中的func、ARG、kwargs]
226
227定义长度(自):
~\AppData\Local\Continuum\anaconda3\lib\site packages\sklearn\model\u selection\u validation.py in\u fit\u和\u predict(估计器、X、y、训练、测试、详细、拟合参数、方法)
887如果y.ndim==1,则n_类=len(set(y)),否则y.shape[1]
888预测=执行预测顺序(
-->889估计器类(预测,n类,方法)
890返回预测,测试
891
~\AppData\Local\Continuum\anaconda3\lib\site packages\sklearn\model\u selection\u validation.py,按执行预测顺序(类、预测、n类、方法)
933'不支持决策_函数'
934'具有不平衡褶皱。{}格式(
-->935 len(等级),n_等级,推荐)
936
937 float_min=np.finfo(predicts.dtype).min
ValueError:在训练中只有2个类,但在整个数据集中只有1个。对于具有不平衡褶皱的decision_函数,这是不支持的。要解决此问题,请使用交叉验证技术,以产生适当分层的褶皱
我遇到了一个类似的问题,在进一步的调查中发现了一条带有错误日志的警告消息-
DataConversionWarning:当需要1d数组时,传递了列向量y。请将y的形状更改为(n_samples,),例如使用ravel()
有两种方法可以解决此问题:
cross_val_predict(sgd_clf, X_train, y_train_5.values.ravel(), cv=3,
method="decision_function")
- 参考-
中的提示,当需要1d数组时,传递了列向量y。
;我释放了我的错误,并做了以下事情:
- 即使在错误日志中-
培训折叠中的课程数(2)与课程总数(1)不匹配。
- 我假设
这里有一个y\u train\u 5
,(可能你正在通过Aurelian的出版物工作)DataFrame
的预期类型是数组类型的对象(意味着SAOE是(n,)或y\u train\u 5
),但在您的情况下,数据帧是二维的(n,1)一维的
- 只需将列向量的
对象作为- y_train_5.iloc[:,0](我更喜欢这个)序列
> y_train_5.iloc[:,0].shape (n,)
cross_val_predict(sgd_clf, X_train, y_train_5.iloc[:,0], cv=3, method="decision_function")