Python tf.1_热无梯度_Python_Tensorflow_Keras

Python tf.1_热无梯度

python tensorflow keras

Python tf.1_热无梯度,python,tensorflow,keras,Python,Tensorflow,Keras,我试图创建一个自定义的loss函数，该函数的输出为整数（在loss函数中转换为一个热编码）但问题是一个热不具有可微梯度。有什么解决办法吗 def new_loss(hidden, output, random_size=20): output1 = tf.cast( output, dtype=tf.int32, ) one_hot = tf.one_hot(output1, num_words, dtype=tf.int32,)

我试图创建一个自定义的loss函数，该函数的输出为整数（在loss函数中转换为一个热编码）

但问题是一个热不具有可微梯度。有什么解决办法吗

def new_loss(hidden, output, random_size=20):

    output1 = tf.cast(
        output,
        dtype=tf.int32,
    )
    one_hot = tf.one_hot(output1, num_words, dtype=tf.int32,)

    one_hot = tf.cast(
        one_hot,
        dtype=tf.float32
    )

    score = K.dot(hidden, one_hot)
    random_words = tf.random.uniform((random_size,), maxval=num_words, dtype=tf.dtypes.int32)
    random_words_1_hot = tf.one_hot(random_words, num_words, dtype=tf.float32)
    scores = K.dot(random_words_1_hot, hidden)
    average = K.sum(K.log (1 - K.sigmoid(scores)) / random_size)

    return (-1 * K.log (K.sigmoid(score)) - average)

问题不在于一次热编码本身，而在于一系列的强制转换操作。更具体地说，TensorFlow不会通过整数传播。假设

hidden

和

output

都是float类型，如果您更改此项

output1=tf.cast（output，dtype=tf.int32，）
one_hot=tf.one_hot（output1，num_words，dtype=tf.int32，）
一热=tf.cast（一热，dtype=tf.float32）

对此

one_hot = tf.one_hot(tf.cast(output, tf.int32), num_words, dtype=tf.float32)

你会得到你的梯度

更详细的示例：

one_hot1=tf.one_hot（tf.cast（np.random.rand（2），tf.int32），num_words，dtype=tf.float32）
隐藏=tf.常数（[1,2,3,4.]，形状=（2,2））
one_hot=tf.cast（one_hot1，dtype=tf.float32）
hidden1=tf.cast（hid，tf.float32）
分数=tf.matmul（隐藏，一个热）
random\u words=tf.random.uniform（（随机大小，），maxval=num\u words，dtype=tf.float32）
random\u words\u 1\u hot=tf.one\u hot（tf.cast（random\u words，tf.int32），num\u words，dtype=tf.float32）
分数=tf.matmul（随机词\u 1\u热，隐藏）
平均值=tf.reduce\u和（tf.log（1-tf.sigmoid（分数））/随机大小）
res=-1*tf.log（tf.sigmoid（分数））-平均值
梯度=tf.梯度（分辨率，[hidden1，one_hot1]）
sess=tf.Session（）
打印（sess.run（res））
打印（分级运行（梯度））

我使用核心TF操作只是为了一致性。您可以看到，如果最初将

one_hot1

创建为

tf.int

，然后重铸为

float

，则不会出现渐变。更多关于这个

在我看来，你的OneHot直接基于它前面的整数，这是可微的，如果我弄错了，发布一个粗略的模型布局，我会让你知道，我的基本点是，如果结果只是得到一个热编码，它不需要梯度，梯度是用于激活步骤。

one_hot = tf.one_hot(tf.cast(output, tf.int32), num_words, dtype=tf.float32)