Tensorflow 不连续函数的Keras自定义梯度函数_Tensorflow_Machine Learning_Keras_Gradient

Tensorflow 不连续函数的Keras自定义梯度函数

tensorflow machine-learning keras

Tensorflow 不连续函数的Keras自定义梯度函数,tensorflow,machine-learning,keras,gradient,Tensorflow,Machine Learning,Keras,Gradient,我尝试编写一个自定义层，该层应该从模型中获取预测，测试预测，并将结果与原始输入一起返回到下一层。我的主要想法是，我想测试这种预测能力，它的廉价性是否能帮助模型取得更好的结果。但是我在编写我的自定义梯度函数时遇到了一个问题，因为图层增加了尺寸keras autograder，以区分不连续的函数，并且得到了错误的梯度。我想要的是，这个自定义层只需要使用渐变，并放弃所有人期望的预测，并将其返回到上一层。这是我写的一个Toye示例，它想猜测除数，如果之前的预测将输入分割，它可以检查每个自定义层，如果是

我尝试编写一个自定义层，该层应该从模型中获取预测，测试预测，并将结果与原始输入一起返回到下一层。我的主要想法是，我想测试这种预测能力，它的廉价性是否能帮助模型取得更好的结果。但是我在编写我的自定义梯度函数时遇到了一个问题，因为图层增加了尺寸keras autograder，以区分不连续的函数，并且得到了错误的梯度。我想要的是，这个自定义层只需要使用渐变，并放弃所有人期望的预测，并将其返回到上一层。这是我写的一个Toye示例，它想猜测除数，如果之前的预测将输入分割，它可以检查每个自定义层，如果是，他将设置一个标志，并将预测和原始输入返回到下一层，以便下一层可以再次尝试使用附加信息猜测除数，即上一个预测是/不是除数。现在我只希望预测的梯度传播到前一层

def try_divison(dividend, divisor):
   if((divisor % dividend) == 0):
       return tf.concat([dividend,tf.constant(1.0, shape=(1,1))], 1)
   else:
       return tf.concat([dividend,tf.constant(0.0, shape=(1,1))], 1)

class Custom_Layer(layers.Layer):
    def __init__(self, out_dim):
        super(Custom_Layer, self).__init__(trainable=False ,dynamic=True)
        self.out_dim = out_dim
    
#@tf.custom_gradient    
    def call(self, prediction, previous_results):
       #def grad(dy):
       #    print(dy)
       #    return dy
       orignial_inp = previous_results[0][0]
       results = tf.concat((previous_results, try_divison(prediction,orignial_inp)), 1)
       return results

   def compute_output_shape(self, input_shape):
       d = (input_shape[0], input_shape[1]+ self.out_dim)
       return d

inp = keras.Input(shape=(1,))
first = layers.Dense(16, activation=swish)(inp)
out1 = layers.Dense(1)(first)
#strangely if i turn the parameter around model.summary() ignores all previously definded layers
custom1 = Custom_Layer(2)(out1, inp)
second = layers.Dense(16, activation=swish)(custom1)
out2 = layers.Dense(1)(second)
custom2 = Custom_Layer(4)(out2, custom1)
third = layers.Dense(16, activation=swish)(custom2)
out = layers.Dense(1)(third)

model = keras.Model(inputs=inp, outputs=out)

model.compile(
    loss = 'mse',
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
) 


xtrain = [70,62,31,18,9]
ytrain = [5,31,31,3,3]

history=model.fit(xtrain, ytrain, epochs=10, batch_size=1)