Python Tensorflow：如何替换或修改渐变？_Python_Tensorflow_Neural Network

Python Tensorflow：如何替换或修改渐变？

python tensorflow neural-network

Python Tensorflow：如何替换或修改渐变？,python,tensorflow,neural-network,Python,Tensorflow,Neural Network,我想替换或修改tensorflow中op或部分图的梯度。这将是理想的，如果我可以使用现有的梯度计算在某些方面，这与tf.stop_gradient（）所做的恰恰相反：与其添加在计算渐变时被忽略的计算，不如添加一个仅在计算渐变时使用的计算一个简单的例子是，通过将渐变乘以一个常数（但不将正向计算乘以一个常数）来简单地缩放渐变。另一个例子是将梯度剪辑到给定范围的东西。使用优化器。计算梯度或tf.gradient获得原始梯度然后做你想做的事最后，使用优化器。应用梯度我从github中找到了一种

我想替换或修改tensorflow中op或部分图的梯度。这将是理想的，如果我可以使用现有的梯度计算

在某些方面，这与

tf.stop_gradient（）

所做的恰恰相反：与其添加在计算渐变时被忽略的计算，不如添加一个仅在计算渐变时使用的计算

一个简单的例子是，通过将渐变乘以一个常数（但不将正向计算乘以一个常数）来简单地缩放渐变。另一个例子是将梯度剪辑到给定范围的东西。

使用

优化器。计算梯度

或

tf.gradient

获得原始梯度
然后做你想做的事
最后，使用优化器。应用梯度

我从github中找到了一种方法，最常用的方法是使用

下面，我实现了反向传播的渐变剪裁，可用于

matmul

，如图所示，或任何其他操作：

import tensorflow as tf
import numpy as np

# from https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):

    # Need to generate a unique name to avoid duplicates:
    rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))

    tf.RegisterGradient(rnd_name)(grad)
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

def clip_grad(x, clip_value, name=None):
    """"
    scales backpropagated gradient so that
    its L2 norm is no more than `clip_value`
    """
    with tf.name_scope(name, "ClipGrad", [x]) as name:
        return py_func(lambda x : x,
                        [x],
                        [tf.float32],
                        name=name,
                        grad=lambda op, g : tf.clip_by_norm(g, clip_value))[0]

用法示例：

with tf.Session() as sess:
    x = tf.constant([[1., 2.], [3., 4.]])
    y = tf.constant([[1., 2.], [3., 4.]])

    print('without clipping')
    z = tf.matmul(x, y)
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval())

    print('with clipping')
    z = tf.matmul(clip_grad(x, 1.0), clip_grad(y, 0.5))
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval())

    print('with clipping between matmuls')
    z = tf.matmul(clip_grad(tf.matmul(x, y), 1.0), y)
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval())

输出：

without clipping
[[ 3.  7.]
 [ 3.  7.]]
with clipping
[[ 0.278543   0.6499337]
 [ 0.278543   0.6499337]]
with clipping between matmuls
[[ 1.57841039  3.43536377]
 [ 1.57841039  3.43536377]]

对于TensorFlow 1.7和TensorFlow 2.0，请参见编辑打击

首先定义自定义渐变：

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
  return 5.0 * grad

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
  output = tf.identity(input, name="Identity")

由于您不希望在向前传球中发生任何情况，请使用新渐变覆盖标识操作的渐变：

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
  return 5.0 * grad

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
  output = tf.identity(input, name="Identity")

下面是一个工作示例，该层使用相同的方法在向后过程中剪辑渐变，而在向前过程中不执行任何操作：

import tensorflow as tf

@tf.RegisterGradient("CustomClipGrad")
def _clip_grad(unused_op, grad):
  return tf.clip_by_value(grad, -0.1, 0.1)

input = tf.Variable([3.0], dtype=tf.float32)

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomClipGrad"}):
  output_clip = tf.identity(input, name="Identity")
grad_clip = tf.gradients(output_clip, input)

# output without gradient clipping in the backwards pass for comparison:
output = tf.identity(input)
grad = tf.gradients(output, input)

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  print("with clipping:", sess.run(grad_clip)[0])
  print("without clipping:", sess.run(grad)[0])

为TensorFlow 1.7和TensorFlow 2.0编辑
从1.7开始，有一种新方法可以用更短的语法重新定义渐变，它也适用于Tensorflow 2.0。它还允许同时重新定义多个操作的梯度。以下是为TensorFlow 1.7和TensorFlow 2.0改写的上述示例：
在向后过程中缩放渐变的层：

@tf.custom_gradient def scale_grad_layer(x): def grad(dy): return 5.0 * dy return tf.identity(x), grad

@tf.custom_gradient def clip_grad_layer(x): def grad(dy): return tf.clip_by_value(dy, -0.1, 0.1) return tf.identity(x), grad
在后向过程中剪裁渐变的图层示例：

@tf.custom_gradient def scale_grad_layer(x): def grad(dy): return 5.0 * dy return tf.identity(x), grad

@tf.custom_gradient def clip_grad_layer(x): def grad(dy): return tf.clip_by_value(dy, -0.1, 0.1) return tf.identity(x), grad

假设正向计算为

y = f(x)
你希望它像这样反向传播

y = b(x)
一个简单的破解方法是：

y = b(x) + tf.stop_gradient(f(x) - b(x))

对于当前的TensorFlow r1.13，使用
修饰函数（输入参数是一个列表
x
）应该返回

向前传球的结果，以及

一种函数，返回一个渐变列表，每个渐变列表对应于
x
中的每个元素

下面是一个带有一个变量的示例：

@tf.custom_gradient def non_differentiable(x): f = tf.cast(x > 0, tf.float32) def grad(dy): return tf.math.maximum(0., 1 - tf.abs(x)) return f, grad
一个有两个：

@tf.custom_gradient def non_differentiable2(x0, x1): f = x0 * tf.cast(x1 > 0, tf.float32) def grad(dy): df_dx0 = tf.cast(x1 > 0, tf.float32) return dy*df_dx0, tf.zeros_like(dy) return f, grad

对于TensorFlow 2，您应该使用
tf.custom_gradient
装饰器，如下所示：

@tf.custom_gradient def func(x): f = # calculate forward pass def grad(dy): gradient = # calculate custom gradient of func return dy * gradient return f, grad
请注意，必须将渐变乘以上游渐变。不过要小心
如果您在创建Keras功能模型时将其作为函数调用，并使用
tf.GradientTape
，则仍将发生自动区分，并且您的自定义渐变将被忽略
相反，您必须将函数放入一个层：

class func_layer(tf.keras.layers.Layer): def __init__(self): super(func_layer, self).__init__() def call(self, x): return func(x)

现在，当您将
func_层
添加到函数模型中时，将适当地计算向后传递。
谢谢，这很有趣。我认为它取代了完整的（端到端）渐变，而且只适用于优化器。我想替换单个op的梯度，同时让其他op的梯度以正常方式传播；我不一定知道如何处理端到端渐变。一个例子是有一个tf.matmult（），其中正向计算是正常进行的，但是渐变是clip（grad，min，max），其中grad是原始渐变，并在一个更大的图中使用。看一看，它返回一个
（gradient，variable）
对的列表，所以我认为您只能修改“您想要的”渐变，比如，找到您想要的
var
Xb:谢谢！这看起来很有用。我不知道如何在python中通过。。。它只是一个带有装饰器的函数吗？你能举一个完整的matmult的例子吗？@AlexI这不容易，但可行：如果你只是想剪裁渐变，我建议你定义一个“identity op”，它除了剪裁渐变之外什么都不做。另外，请参见@AlexI实现了实际的反向传播渐变剪裁。请参见编辑此操作是否会同时修改链中的后续渐变？例如，对于剪裁，是否会修改？@KevinP：在标识操作的向后传递过程中，渐变将仅被剪裁1次。但是链中所有先前的层都会受到此影响，因为每个层都使用其下一层的渐变进行向后传递。但是之前的图层将不再剪辑。谢谢。整个后撑与前撑的较量让问题变得比预想的更令人困惑。我的意思是在后支撑梯度链中的后面。grad是否可以接受其他参数，例如，一些中间变量以减少计算？@HuangYuheng grad不能接受额外的参数，但它可以使用tensorflow变量，这些变量反过来可以在向前传球中更改。您好，谢谢您的回答。你知道如何改变relu函数的梯度吗？