Tensorflow 如何将注意力机制与Multirncell和dynamic_解码结合使用？_Tensorflow_Recurrent Neural Network_Sequence To Sequence

Tensorflow 如何将注意力机制与Multirncell和dynamic_解码结合使用？

tensorflow

Tensorflow 如何将注意力机制与Multirncell和dynamic_解码结合使用？,tensorflow,recurrent-neural-network,sequence-to-sequence,Tensorflow,Recurrent Neural Network,Sequence To Sequence,我想创建一个使用注意机制的基于多层动态RNN的解码器。为此，我首先创建一个注意力机制： attention_mechanism = BahdanauAttention(num_units=ATTENTION_UNITS, memory=encoder_outputs, normalize=True) attention_wrapper

我想创建一个使用注意机制的基于多层动态RNN的解码器。为此，我首先创建一个注意力机制：

attention_mechanism = BahdanauAttention(num_units=ATTENTION_UNITS,
                                        memory=encoder_outputs,
                                        normalize=True)

attention_wrapper = AttentionWrapper(cell=self._create_lstm_cell(DECODER_SIZE),
                                             attention_mechanism=attention_mechanism,
                                             output_attention=False,
                                             alignment_history=True,
                                             attention_layer_size=ATTENTION_LAYER_SIZE)

然后，我使用

AttentionWrapper

将LSTM单元格包装为注意机制：

attention_mechanism = BahdanauAttention(num_units=ATTENTION_UNITS,
                                        memory=encoder_outputs,
                                        normalize=True)

attention_wrapper = AttentionWrapper(cell=self._create_lstm_cell(DECODER_SIZE),
                                             attention_mechanism=attention_mechanism,
                                             output_attention=False,
                                             alignment_history=True,
                                             attention_layer_size=ATTENTION_LAYER_SIZE)

其中，

self.\u create\lstm\u cell

定义如下：

@staticmethod
def _create_lstm_cell(cell_size):
    return BasicLSTMCell(cell_size)

然后我做一些簿记（例如，创建我的

multirncell

，创建初始状态，创建

traininghelp

，等等）

但是我收到以下错误：

AttributeError:'LSTMStateTuple'对象没有属性'attention'

向MultirnCell动态解码器添加注意机制的正确方法是什么？

您是否尝试过使用tf.contrib提供的

下面是一个使用注意包装器和退出的示例：

cells = []
for i in range(n_layers):                   
    cell = tf.contrib.rnn.LSTMCell(n_hidden, state_is_tuple=True)

    cell = tf.contrib.rnn.AttentionCellWrapper(
        cell, attn_length=40, state_is_tuple=True)

    cell = tf.contrib.rnn.DropoutWrapper(cell,output_keep_prob=0.5)
    cells.append(cell)

cell = tf.contrib.rnn.MultiRNNCell(cells, state_is_tuple=True)
init_state = cell.zero_state(batch_size, tf.float32)

您需要做的是创建多层单元，然后用AttentionWrapper将其包装，下面是一个示例：

def decoding_layer(dec_input, encoder_state,
               target_sequence_length, max_target_sequence_length,
               rnn_size,
               num_layers, target_vocab_to_int, target_vocab_size,
               batch_size, keep_prob, decoding_embedding_size , encoder_outputs):
"""
Create decoding layer
:param dec_input: Decoder input
:param encoder_state: Encoder state
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_target_sequence_length: Maximum length of target sequences
:param rnn_size: RNN Size
:param num_layers: Number of layers
:param target_vocab_to_int: Dictionary to go from the target words to an id
:param target_vocab_size: Size of target vocabulary
:param batch_size: The size of the batch
:param keep_prob: Dropout keep probability
:param decoding_embedding_size: Decoding embedding size
:return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
"""
# 1. Decoder Embedding
dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

# 2. Construct the decoder cell
def create_cell(rnn_size):
    lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size,
                                        initializer=tf.random_uniform_initializer(-0.1,0.1,seed=2))
    drop = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
    return drop


dec_cell = tf.contrib.rnn.MultiRNNCell([create_cell(rnn_size) for _ in range(num_layers)])
#dec_cell = tf.contrib.rnn.MultiRNNCell(cells_a)  

#attention details 
attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units=rnn_size, memory=encoder_outputs) 

attn_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism , attention_layer_size=rnn_size/2)

attn_zero = attn_cell.zero_state(batch_size , tf.float32 )

attn_zero = attn_zero.clone(cell_state = encoder_state)

#new_state = tf.contrib.seq2seq.AttentionWrapperState(cell_state = encoder_state, attention = attn_zero  , time = 0 ,alignments=None , alignment_history=())

"""out_cell = tf.contrib.rnn.OutputProjectionWrapper(
            attn_cell, target_vocab_size, reuse=True
        )"""
#end of attention 
#tensor_util.make_tensor_proto(attn_cell)
output_layer = Dense(target_vocab_size,
                     kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))

with tf.variable_scope("decode"):
    train_decoder_out = decoding_layer_train(attn_zero, attn_cell, dec_embed_input, 
                     target_sequence_length, max_target_sequence_length, output_layer, keep_prob)

with tf.variable_scope("decode", reuse=True):
    infer_decoder_out = decoding_layer_infer(attn_zero, attn_cell, dec_embeddings, 
                         target_vocab_to_int['<GO>'], target_vocab_to_int['<EOS>'], max_target_sequence_length, 
                         target_vocab_size, output_layer, batch_size, keep_prob)

return (train_decoder_out, infer_decoder_out)

def解码层（dec输入、编码器状态、，
目标序列长度，最大目标序列长度，
rnn_尺寸，
层数，目标声音到整数，目标声音大小，
批量大小、保留大小、解码大小、编码器输出）：
"""
创建解码层
：param dec_输入：解码器输入
：param encoder_state：编码器状态
：param target_sequence_length：目标批次中每个序列的长度
：param max_target_sequence_length:目标序列的最大长度
：参数rnn\U大小：rnn大小
：param num_layers：层数
：param target_vocab_to_int：从目标单词转到id的字典
：param target_vocab_size:目标词汇的大小
：param batch_size：批次的大小
：参数保持概率：退出保持概率
：param解码\嵌入\大小：解码嵌入大小
：return:Tuple of（Training BasicDecodeOutput、推断BasicDecodeOutput）
"""
# 1. 解码器嵌入
dec_embeddings=tf.Variable（tf.random_uniform（[target_vocab_size，decoding_embeddings_size]））
dec_embed_input=tf.nn.嵌入查找（dec_嵌入，dec_输入）
# 2. 构造解码单元
def创建单元（rnn大小）：
lstm\U单元=tf.contrib.rnn.LSTMCell（rnn\U大小，
初始值设定项=tf.随机\均匀\初始值设定项（-0.1,0.1，seed=2））
drop=tf.contrib.rnn.dropoutrapper（lstm\u单元，输出保持保持保持保持保持）
回程下降
dec_cell=tf.contrib.rnn.multirncell（[为范围内的（num_层）创建_cell（rnn_大小）]）
#dec\u cell=tf.contrib.rnn.multirncell（cells\u a）
#注意细节
注意机制=tf.contrib.seq2seq.bahdanaauattention（num\u单位=rnn\u大小，内存=编码器输出）
attn_cell=tf.contrib.seq2seq.AttentionWrapper（dec_cell，attention_机制，attention_layer_size=rnn_size/2）
attn_zero=attn_单元格的零状态（批次大小，tf.float32）
附件零=附件零。克隆（单元状态=编码器状态）
#新状态=tf.contrib.seq2seq.AttentionWrapperState（单元格状态=编码器状态，注意=附件零，时间=0，对齐=无，对齐历史=（））
“”“out\u cell=tf.contrib.rnn.OutputProjectionWrapper(
attn_单元，目标语音大小，重用=真
)"""
#注意力的终结
#使用张量生成张量原型（attn单元）
输出层=密集（目标层大小，
内核\初始值设定项=tf.截断\正常\初始值设定项（平均值=0.0，标准差=0.1））
使用tf.variable_scope（“解码”）：
序列解码器输出=解码层序列（attn零、attn单元、dec嵌入输入、，
目标\序列\长度，最大\目标\序列\长度，输出\层，保持\概率）
使用tf.variable_scope（“decode”，reuse=True）：
推断\u解码器\u输出=解码\u层\u推断（attn\u零、attn\u单元、dec\u嵌入、，
target_vocab_to_int[''），target_vocab_to_int[''，max_target_sequence_长度，
目标（声音大小、输出层、批次大小、保留问题）
返回（训练解码器输出、推断解码器输出）

为什么要注释行#new#u state=tf.contrib.seq2seq.AttentionWrapperState（单元格#u state=encoder#u state，attn_zero，time=0，alignments=None，alignment#u history=（））