Neural network Tensorflow中GRU细胞的解释?
Tensorflow的Neural network Tensorflow中GRU细胞的解释?,neural-network,tensorflow,recurrent-neural-network,gated-recurrent-unit,Neural Network,Tensorflow,Recurrent Neural Network,Gated Recurrent Unit,Tensorflow的GRUCell单元的以下代码显示了获得更新隐藏状态的典型操作,当序列中的前一个隐藏状态与当前输入一起提供时 def __call__(self, inputs, state, scope=None): """Gated recurrent unit (GRU) with nunits cells.""" with vs.variable_scope(scope or type(self).__name__): # "GRUCell" wit
GRUCell
单元的以下代码显示了获得更新隐藏状态的典型操作,当序列中的前一个隐藏状态与当前输入一起提供时
def __call__(self, inputs, state, scope=None):
"""Gated recurrent unit (GRU) with nunits cells."""
with vs.variable_scope(scope or type(self).__name__): # "GRUCell"
with vs.variable_scope("Gates"): # Reset gate and update gate.
# We start with bias of 1.0 to not reset and not update.
r, u = array_ops.split(1, 2, _linear([inputs, state],
2 * self._num_units, True, 1.0))
r, u = sigmoid(r), sigmoid(u)
with vs.variable_scope("Candidate"):
c = self._activation(_linear([inputs, r * state],
self._num_units, True))
new_h = u * state + (1 - u) * c
return new_h, new_h
但我没有看到任何权重
和偏差
。
e、 g.我的理解是,获取r
和u
需要将权重和偏差与当前输入和/或隐藏状态相乘,以获得更新的隐藏状态
我编写了一个gru单元,如下所示:
def gru_unit(previous_hidden_state, x):
r = tf.sigmoid(tf.matmul(x, Wr) + br)
z = tf.sigmoid(tf.matmul(x, Wz) + bz)
h_ = tf.tanh(tf.matmul(x, Wx) + tf.matmul(previous_hidden_state, Wh) * r)
current_hidden_state = tf.mul((1 - z), h_) + tf.mul(previous_hidden_state, z)
return current_hidden_state
在这里,我明确地使用权重Wx、Wr、Wz、Wh
和偏差br、bh、bz
等来获得更新的隐藏状态。这些权重和偏差是训练后学习/调整的内容
我如何利用Tensorflow的内置
GRUCell
获得与上述相同的结果?它们就在那里,你只是在代码中看不到它们,因为_线性函数添加了权重和偏差
r, u = array_ops.split(1, 2, _linear([inputs, state],
2 * self._num_units, True, 1.0))
它们将
r
和z
门连接起来,一次完成所有操作,节省了计算量。因此,权重和偏差似乎是根据需要创建的,与get_variable
在时间步长上共享,如果在同一变量范围内调用,将返回相同的内容。但我不清楚权重矩阵是如何初始化的。我认为它是使用当前变量范围的默认初始值设定项初始化的。我认为这也回答了我关于tensorflow RNN的其他问题。
def _linear(args, output_size, bias, bias_start=0.0, scope=None):
"""Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.
Args:
args: a 2D Tensor or a list of 2D, batch x n, Tensors.
output_size: int, second dimension of W[i].
bias: boolean, whether to add a bias term or not.
bias_start: starting value to initialize the bias; 0 by default.
scope: VariableScope for the created subgraph; defaults to "Linear".
Returns:
A 2D Tensor with shape [batch x output_size] equal to
sum_i(args[i] * W[i]), where W[i]s are newly created matrices.
Raises:
ValueError: if some of the arguments has unspecified or wrong shape.
"""
if args is None or (nest.is_sequence(args) and not args):
raise ValueError("`args` must be specified")
if not nest.is_sequence(args):
args = [args]
# Calculate the total size of arguments on dimension 1.
total_arg_size = 0
shapes = [a.get_shape().as_list() for a in args]
for shape in shapes:
if len(shape) != 2:
raise ValueError("Linear is expecting 2D arguments: %s" % str(shapes))
if not shape[1]:
raise ValueError("Linear expects shape[1] of arguments: %s" % str(shapes))
else:
total_arg_size += shape[1]
# Now the computation.
with vs.variable_scope(scope or "Linear"):
matrix = vs.get_variable("Matrix", [total_arg_size, output_size])
if len(args) == 1:
res = math_ops.matmul(args[0], matrix)
else:
res = math_ops.matmul(array_ops.concat(1, args), matrix)
if not bias:
return res
bias_term = vs.get_variable(
"Bias", [output_size],
initializer=init_ops.constant_initializer(bias_start))
return res + bias_term