Python 损失函数中的一种热编码_Python_Tensorflow_Machine Learning_Deep Learning_Tensor

Python 损失函数中的一种热编码

python tensorflow machine-learning deep-learning

Python 损失函数中的一种热编码,python,tensorflow,machine-learning,deep-learning,tensor,Python,Tensorflow,Machine Learning,Deep Learning,Tensor,我试图在我的损失函数中对预测进行一个热编码 def loss(y_true, y_pred, smooth=1e-7): y_true = K.flatten(y_true) y_true = one_hot(y_true, n_classes) y_pred = softargmax(y_pred) y_pred = K.flatten(y_pred) y_pred = one_hot(y_pred, n_classes) inters

我试图在我的损失函数中对预测进行一个热编码

def loss(y_true, y_pred, smooth=1e-7):
    y_true = K.flatten(y_true)
    y_true = one_hot(y_true, n_classes)
    y_pred = softargmax(y_pred)
    y_pred = K.flatten(y_pred)
    y_pred = one_hot(y_pred, n_classes)
    
    intersect = K.sum(y_true * y_pred, axis=-1)
    denom = K.sum(y_true + y_pred, axis=-1)
    return K.mean((2. * intersect / (denom + smooth)))

但是将

y\u pred

强制转换为

int32

以使用内置的

K.one\u hot

会导致

 ValueError: No gradients provided for any variable:

错误。因此，我编写了自己的one_hot编码方法，避免将

y_pred

转换为

int32

def one_hot(xs, n_classes):
    table = tf.eye(n_classes, dtype=tf.dtypes.float32)
    return tf.map_fn(lambda x: table[tf.raw_ops.Cast(x=x, DstT=tf.int32)], xs)

one_hot(tf.constant([0.0, 1.0, 2.0]), 3)

我的问题如下。使用

tf.gather/gatner\u nd

会导致相同的梯度误差。我能找到的唯一一个不会导致梯度误差的函数是

tf.map\u fn

，它非常慢，再次切换到

矢量化的\u map

会导致梯度误差。有没有另一种方法可以实现一个具有渐变的热编码？

您可以通过将最大logit设置为1.0并屏蔽，创建一个数值稳定的

one\u hot

import tensorflow as tf


def stable_one_hot(vec):
    """
    Args:
        vec: tf.Tensor, a batch of logits to be encoded
    
    Returns:
        tf.Tensor, a batch of numerically stable one-hot encoded logits
    """
    m = tf.math.reduce_max(vec, axis=1, keepdims=True)
    e = tf.math.exp(vec - m)
    mask = tf.cast(tf.math.not_equal(e, 1.0), tf.float32)
    vec -= 1e9 * mask
    return tf.nn.softmax(vec, axis=1)

# dummy data w/batch of size 32
X = tf.random.normal([32, 100])

# dummy labels w/10 possibilities
y = tf.random.uniform(shape=[32], minval=0, maxval=10, dtype=tf.int32)
# one-hot them
y_true = tf.one_hot(y, 10)

# simple network
nn = tf.keras.layers.Dense(10)

# forward pass
with tf.GradientTape() as tape:
    y_pred = nn(X)
    y_pred = stable_one_hot(y_pred)
    intersect = tf.math.reduce_sum(y_true * y_pred, -1)
    denom = tf.math.reduce_sum(y_true + y_pred, -1)
    loss = 2.0 * intersect / (denom + 1e-7)
    loss = tf.math.reduce_mean(loss)

grads = tape.gradient(loss, nn.trainable_variables)
assert grads != [None, None]

print(f"loss: {loss.numpy():.4f}")
# loss: 0.1250

为什么不直接使用？@gobrewers14正如我在问题中所写的那样，要使一个“热”工作，必须将“pred”强制转换为int32操作没有梯度，因此不起作用