Machine learning 惩罚一个热向量的所有分量的二进制交叉熵_Machine Learning_Classification_Multilabel Classification_One Hot Encoding_Cross Entropy

Machine learning 惩罚一个热向量的所有分量的二进制交叉熵

machine-learning

Machine learning 惩罚一个热向量的所有分量的二进制交叉熵,machine-learning,classification,multilabel-classification,one-hot-encoding,cross-entropy,Machine Learning,Classification,Multilabel Classification,One Hot Encoding,Cross Entropy,我知道在两类情况下，二元交叉熵与分类交叉熵是相同的此外，我很清楚什么是softmax。因此，我看到分类交叉熵只惩罚了一个应该为1的分量（概率）但是为什么，我不能或者不应该在一个热向量上使用二进制交叉熵呢 Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification: ################ pred = [0.1 0.3 0.2 0.4] label (one hot) = [0

我知道在两类情况下，二元交叉熵与分类交叉熵是相同的

此外，我很清楚什么是softmax。
因此，我看到分类交叉熵只惩罚了一个应该为1的分量（概率）

但是为什么，我不能或者不应该在一个热向量上使用二进制交叉熵呢

Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred            = [0.1 0.3 0.2 0.4]
label (one hot) = [0   1   0   0]
costfunction: categorical crossentropy 
                            = sum(label * -log(pred)) //just consider the 1-label
                            = 0.523
Why not that?
################
pred            = [0.1 0.3 0.2 0.4]
label (one hot) = [0   1   0   0]
costfunction: binary crossentropy
                            = sum(- label * log(pred) - (1 - label) * log(1 - pred))
                            = 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
                            = 0.887

我发现在二进制交叉熵中，零是一个目标类，对应于以下一种热编码：

target class zero 0 -> [1 0]
target class one  1 -> [0 1]

总结：为什么我们只计算/总结预测类别的负对数可能性。我们为什么不惩罚其他应该是零级/非零级的课程呢

在这种情况下，我们使用二进制交叉熵来表示一个热向量。预期零标签的概率也将受到惩罚。

请参见关于类似问题的说明。简而言之，二元交叉熵公式对于一个热向量没有意义。可以对两个或更多类应用softmax交叉熵，也可以使用

标签中的（独立）概率向量，具体取决于任务
但是为什么，我不能或者不应该在一个热向量上使用二进制交叉熵呢
Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred            = [0.1 0.3 0.2 0.4]
label (one hot) = [0   1   0   0]
costfunction: categorical crossentropy 
                            = sum(label * -log(pred)) //just consider the 1-label
                            = 0.523
Why not that?
################
pred            = [0.1 0.3 0.2 0.4]
label (one hot) = [0   1   0   0]
costfunction: binary crossentropy
                            = sum(- label * log(pred) - (1 - label) * log(1 - pred))
                            = 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
                            = 0.887

您计算的是4个独立特征的二进制交叉熵：
pred   = [0.1 0.3 0.2 0.4]
label  = [0   1   0   0]

模型推理预测第一个特征以10%的概率打开，第二个特征以30%的概率打开，依此类推。目标标签是这样解释的：除第二个功能外，所有功能都处于禁用状态。注意，[1,1,1,1]
也是一个完全有效的标签，即它不是一个热向量，pred=[0.5,0.8,0.7,0.1]
是一个有效的预测，即总和不必等于一
换句话说，您的计算是有效的，但用于一个完全不同的问题：多标签非排他二进制分类
另见