Scikit learn ValueError:Can'；t处理多标签指示器和二进制的混合_Scikit Learn_Keras_Grid Search_One Hot Encoding_Multiclass Classification

Scikit learn ValueError:Can'；t处理多标签指示器和二进制的混合

scikit-learn keras

Scikit learn ValueError:Can'；t处理多标签指示器和二进制的混合,scikit-learn,keras,grid-search,one-hot-encoding,multiclass-classification,Scikit Learn,Keras,Grid Search,One Hot Encoding,Multiclass Classification,我将Keras与scikit学习包装器一起使用。特别是，我想使用GridSearchCV进行超参数优化这是一个多类问题，即目标变量只能在一组n类上选择一个标签。例如，目标变量可以是'Class1'，'Class2'…'Classn’ # self._arch creates my model nn = KerasClassifier(build_fn=self._arch, verbose=0) clf = GridSearchCV( nn, param_grid={ ... },

我将Keras与scikit学习包装器一起使用。特别是，我想使用GridSearchCV进行超参数优化

这是一个多类问题，即目标变量只能在一组n类上选择一个标签。例如，目标变量可以是'Class1'，'Class2'…'Classn’

# self._arch creates my model
nn = KerasClassifier(build_fn=self._arch, verbose=0)
clf = GridSearchCV(
  nn,
  param_grid={ ... },
  # I use f1 score macro averaged
  scoring='f1_macro',
  n_jobs=-1)

# self.fX is the data matrix
# self.fy_enc is the target variable encoded with one-hot format
clf.fit(self.fX.values, self.fy_enc.values)

问题在于，当在交叉验证期间计算分数时，验证样本的真实标签编码为一个hot，而由于某种原因，预测会塌陷为二进制标签（当目标变量只有两个类时）。例如，这是堆栈跟踪的最后一部分：

...........................................................................
/Users/fbrundu/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/metrics/classification.py in _check_targets(y_true=array([[ 0.,  1.],
       [ 0.,  1.],
       [ 0... 0.,  1.],
       [ 0.,  1.],
       [ 0.,  1.]]), y_pred=array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,...0, 1, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 1, 1]))
     77     if y_type == set(["binary", "multiclass"]):
     78         y_type = set(["multiclass"])
     79
     80     if len(y_type) > 1:
     81         raise ValueError("Can't handle mix of {0} and {1}"
---> 82                          "".format(type_true, type_pred))
        type_true = 'multilabel-indicator'
        type_pred = 'binary'
     83
     84     # We can't have more than one value on y_type => The set is no more needed
     85     y_type = y_type.pop()
     86

ValueError: Can't handle mix of multilabel-indicator and binary

我如何指导Keras/sklearn在一个热编码中返回预测？

在Vivek的评论之后，我使用了原始（而不是一个热编码）目标数组，并按照配置了（在我的Keras模型中，请参见代码）损失

稀疏分类交叉熵
直接使用fy
而不编码值时会发生什么情况。这在多类中不应该是一个问题。仅在多标签问题中需要对目标进行一次热编码。如果您已解决问题，请接受您的答案并关闭question@VivekKumar如果你知道SO的规则，你不能在问题提出后2天内接受答案。是的，对不起。我的Bad@fbrundu，你用它解决了你的问题？我还是有问题。当我使用loss='sparse_categorical_crossentropy'metrics=['f1_score']`，我的f1成绩超过1，这显然是错误的。@fbrundu感谢您的回复。我正在努力。我认为“分类交叉熵”即使y_真的没有被编码也会起作用。它与文档不同。
arch.compile(
  optimizer='sgd',
  loss='sparse_categorical_crossentropy',
  metrics=['accuracy'])