Python 如何将tf.layer_norm转换为julia?
我正在尝试将tensorflow代码转换为julia,但我不明白Python 如何将tf.layer_norm转换为julia?,python,tensorflow,julia,Python,Tensorflow,Julia,我正在尝试将tensorflow代码转换为julia,但我不明白层规范函数是如何工作的 我正在尝试转换该部分: output = tf.contrib.layers.layer_norm(attn_out + h, begin_norm_axis=-1, scope='LayerNorm') 张量attn\u out+h的形状是(qlen、bsz、d\u模型),当我下载经过训练的模型时,变量gamma和
层规范
函数是如何工作的
我正在尝试转换该部分:
output = tf.contrib.layers.layer_norm(attn_out + h, begin_norm_axis=-1,
scope='LayerNorm')
张量
attn\u out+h
的形状是(qlen、bsz、d\u模型)
,当我下载经过训练的模型时,变量gamma和beta具有形状(d\u模型,)
。另外,nn.moments
的输出应具有形状[qlen,bsz,1]
<代码>层规范函数调用具有以下参数的批规范函数:
add_arg_scope
def layer_norm(inputs,
center=True,
scale=True,
activation_fn=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
begin_norm_axis=1,
begin_params_axis=-1,
scope=None):
"""Adds a Layer Normalization layer.
Based on the paper:
"Layer Normalization"
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
https://arxiv.org/abs/1607.06450.
Can be used as a normalizer function for conv2d and fully_connected.
Given a tensor `inputs` of rank `R`, moments are calculated and normalization
is performed over axes `begin_norm_axis ... R - 1`. Scaling and centering,
if requested, is performed over axes `begin_params_axis .. R - 1`.
By default, `begin_norm_axis = 1` and `begin_params_axis = -1`,
meaning that normalization is performed over all but the first axis
(the `HWC` if `inputs` is `NHWC`), while the `beta` and `gamma` trainable
parameters are calculated for the rightmost axis (the `C` if `inputs` is
`NHWC`). Scaling and recentering is performed via broadcast of the
`beta` and `gamma` parameters with the normalized tensor.
The shapes of `beta` and `gamma` are `inputs.shape[begin_params_axis:]`,
and this part of the inputs' shape must be fully defined.
Args:
inputs: A tensor having rank `R`. The normalization is performed over axes
`begin_norm_axis ... R - 1` and centering and scaling parameters are
calculated over `begin_params_axis ... R - 1`.
center: If True, add offset of `beta` to normalized tensor. If False, `beta`
is ignored.
scale: If True, multiply by `gamma`. If False, `gamma` is not used. When the
next layer is linear (also e.g. `nn.relu`), this can be disabled since the
scaling can be done by the next layer.
activation_fn: Activation function, default set to None to skip it and
maintain a linear activation.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
begin_norm_axis: The first normalization dimension: normalization will be
performed along dimensions `begin_norm_axis : rank(inputs)`
begin_params_axis: The first parameter (beta, gamma) dimension: scale and
centering parameters will have dimensions
`begin_params_axis : rank(inputs)` and will be broadcast with the
normalized inputs accordingly.
scope: Optional scope for `variable_scope`.
Returns:
A `Tensor` representing the output of the operation, having the same
shape and dtype as `inputs`.
Raises:
ValueError: If the rank of `inputs` is not known at graph build time,
or if `inputs.shape[begin_params_axis:]` is not fully defined at
graph build time.
"""
with variable_scope.variable_scope(
scope, 'LayerNorm', [inputs], reuse=reuse) as sc:
inputs = ops.convert_to_tensor(inputs)
inputs_shape = inputs.shape
inputs_rank = inputs_shape.ndims
if inputs_rank is None:
raise ValueError('Inputs %s has undefined rank.' % inputs.name)
dtype = inputs.dtype.base_dtype
if begin_norm_axis < 0:
begin_norm_axis = inputs_rank + begin_norm_axis
if begin_params_axis >= inputs_rank or begin_norm_axis >= inputs_rank:
raise ValueError('begin_params_axis (%d) and begin_norm_axis (%d) '
'must be < rank(inputs) (%d)' %
(begin_params_axis, begin_norm_axis, inputs_rank))
params_shape = inputs_shape[begin_params_axis:]
if not params_shape.is_fully_defined():
raise ValueError(
'Inputs %s: shape(inputs)[%s:] is not fully defined: %s' %
(inputs.name, begin_params_axis, inputs_shape))
# Allocate parameters for the beta and gamma of the normalization.
beta, gamma = None, None
if center:
beta_collections = utils.get_variable_collections(variables_collections,
'beta')
beta = variables.model_variable(
'beta',
shape=params_shape,
dtype=dtype,
initializer=init_ops.zeros_initializer(),
collections=beta_collections,
trainable=trainable)
if scale:
gamma_collections = utils.get_variable_collections(
variables_collections, 'gamma')
gamma = variables.model_variable(
'gamma',
shape=params_shape,
dtype=dtype,
initializer=init_ops.ones_initializer(),
collections=gamma_collections,
trainable=trainable)
# By default, compute the moments across all the dimensions except the one with index 0.
norm_axes = list(range(begin_norm_axis, inputs_rank))
mean, variance = nn.moments(inputs, norm_axes, keep_dims=True)
# Compute layer normalization using the batch_normalization function.
# Note that epsilon must be increased for float16 due to the limited
# representable range.
variance_epsilon = 1e-12 if dtype != dtypes.float16 else 1e-3
outputs = nn.batch_normalization(
inputs,
mean,
variance,
offset=beta,
scale=gamma,
variance_epsilon=variance_epsilon)
outputs.set_shape(inputs_shape)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
我不明白它是如何工作的,也无法转换为julia代码。我对张量形状使用相同的顺序。如果x在tf中有shape
[a,b,c]
,那么它在我的julia代码中也有shape[a,b,c]
形状。Flux
在julia中提供批量规范化和层规范化(以及其他一些)(称为LayerNorm
和BatchNorm
。在您的问题中,您没有指定在Julia中使用的库,但是如果您使用的不是Flux
,那么代码可能对您自己的实现有所帮助。请参阅:
如果您使用的是Knet
(我从来没有这样做过),那么至少已经实现了批量标准化,我不确定是否还有其他标准化层。它们也可能存在
还请在Julia中发布您已经尝试过的内容。什么包是
layer norm
的一部分?layer\u norm
函数属于tf.contrib.layers.layer\u norm
@tf_export("nn.batch_normalization")
def batch_normalization(x,
mean,
variance,
offset,
scale,
variance_epsilon,
name=None):
r"""Batch normalization.
Normalizes a tensor by `mean` and `variance`, and applies (optionally) a
`scale` \\(\gamma\\) to it, as well as an `offset` \\(\beta\\):
\\(\frac{\gamma(x-\mu)}{\sigma}+\beta\\)
`mean`, `variance`, `offset` and `scale` are all expected to be of one of two
shapes:
* In all generality, they can have the same number of dimensions as the
input `x`, with identical sizes as `x` for the dimensions that are not
normalized over (the 'depth' dimension(s)), and dimension 1 for the
others which are being normalized over.
`mean` and `variance` in this case would typically be the outputs of
`tf.nn.moments(..., keep_dims=True)` during training, or running averages
thereof during inference.
* In the common case where the 'depth' dimension is the last dimension in
the input tensor `x`, they may be one dimensional tensors of the same
size as the 'depth' dimension.
This is the case for example for the common `[batch, depth]` layout of
fully-connected layers, and `[batch, height, width, depth]` for
convolutions.
`mean` and `variance` in this case would typically be the outputs of
`tf.nn.moments(..., keep_dims=False)` during training, or running averages
thereof during inference.
See Source: [Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift; S. Ioffe, C. Szegedy]
(http://arxiv.org/abs/1502.03167).
Args:
x: Input `Tensor` of arbitrary dimensionality.
mean: A mean `Tensor`.
variance: A variance `Tensor`.
offset: An offset `Tensor`, often denoted \\(\beta\\) in equations, or
None. If present, will be added to the normalized tensor.
scale: A scale `Tensor`, often denoted \\(\gamma\\) in equations, or
`None`. If present, the scale is applied to the normalized tensor.
variance_epsilon: A small float number to avoid dividing by 0.
name: A name for this operation (optional).
Returns:
the normalized, scaled, offset tensor.
"""
with ops.name_scope(name, "batchnorm", [x, mean, variance, scale, offset]):
inv = math_ops.rsqrt(variance + variance_epsilon)
if scale is not None:
inv *= scale
# Note: tensorflow/contrib/quantize/python/fold_batch_norms.py depends on
# the precise order of ops that are generated by the expression below.
return x * math_ops.cast(inv, x.dtype) + math_ops.cast(
offset - mean * inv if offset is not None else -mean * inv, x.dtype)