Tensorflow 如何将注意力机制与Multirncell和dynamic_解码结合使用?

Tensorflow 如何将注意力机制与Multirncell和dynamic_解码结合使用?,tensorflow,recurrent-neural-network,sequence-to-sequence,Tensorflow,Recurrent Neural Network,Sequence To Sequence,我想创建一个使用注意机制的基于多层动态RNN的解码器。为此,我首先创建一个注意力机制: attention_mechanism = BahdanauAttention(num_units=ATTENTION_UNITS, memory=encoder_outputs, normalize=True) attention_wrapper

我想创建一个使用注意机制的基于多层动态RNN的解码器。为此,我首先创建一个注意力机制:

attention_mechanism = BahdanauAttention(num_units=ATTENTION_UNITS,
                                        memory=encoder_outputs,
                                        normalize=True)
attention_wrapper = AttentionWrapper(cell=self._create_lstm_cell(DECODER_SIZE),
                                             attention_mechanism=attention_mechanism,
                                             output_attention=False,
                                             alignment_history=True,
                                             attention_layer_size=ATTENTION_LAYER_SIZE)
然后,我使用
AttentionWrapper
将LSTM单元格包装为注意机制:

attention_mechanism = BahdanauAttention(num_units=ATTENTION_UNITS,
                                        memory=encoder_outputs,
                                        normalize=True)
attention_wrapper = AttentionWrapper(cell=self._create_lstm_cell(DECODER_SIZE),
                                             attention_mechanism=attention_mechanism,
                                             output_attention=False,
                                             alignment_history=True,
                                             attention_layer_size=ATTENTION_LAYER_SIZE)
其中,
self.\u create\lstm\u cell
定义如下:

@staticmethod
def _create_lstm_cell(cell_size):
    return BasicLSTMCell(cell_size)
然后我做一些簿记(例如,创建我的
multirncell
,创建初始状态,创建
traininghelp
,等等)

但是我收到以下错误:
AttributeError:'LSTMStateTuple'对象没有属性'attention'

向MultirnCell动态解码器添加注意机制的正确方法是什么?

您是否尝试过使用tf.contrib提供的

下面是一个使用注意包装器和退出的示例:

cells = []
for i in range(n_layers):                   
    cell = tf.contrib.rnn.LSTMCell(n_hidden, state_is_tuple=True)

    cell = tf.contrib.rnn.AttentionCellWrapper(
        cell, attn_length=40, state_is_tuple=True)

    cell = tf.contrib.rnn.DropoutWrapper(cell,output_keep_prob=0.5)
    cells.append(cell)

cell = tf.contrib.rnn.MultiRNNCell(cells, state_is_tuple=True)
init_state = cell.zero_state(batch_size, tf.float32)

您需要做的是创建多层单元,然后用AttentionWrapper将其包装,下面是一个示例:

def decoding_layer(dec_input, encoder_state,
               target_sequence_length, max_target_sequence_length,
               rnn_size,
               num_layers, target_vocab_to_int, target_vocab_size,
               batch_size, keep_prob, decoding_embedding_size , encoder_outputs):
"""
Create decoding layer
:param dec_input: Decoder input
:param encoder_state: Encoder state
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_target_sequence_length: Maximum length of target sequences
:param rnn_size: RNN Size
:param num_layers: Number of layers
:param target_vocab_to_int: Dictionary to go from the target words to an id
:param target_vocab_size: Size of target vocabulary
:param batch_size: The size of the batch
:param keep_prob: Dropout keep probability
:param decoding_embedding_size: Decoding embedding size
:return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
"""
# 1. Decoder Embedding
dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

# 2. Construct the decoder cell
def create_cell(rnn_size):
    lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size,
                                        initializer=tf.random_uniform_initializer(-0.1,0.1,seed=2))
    drop = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
    return drop


dec_cell = tf.contrib.rnn.MultiRNNCell([create_cell(rnn_size) for _ in range(num_layers)])
#dec_cell = tf.contrib.rnn.MultiRNNCell(cells_a)  

#attention details 
attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units=rnn_size, memory=encoder_outputs) 

attn_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism , attention_layer_size=rnn_size/2)

attn_zero = attn_cell.zero_state(batch_size , tf.float32 )

attn_zero = attn_zero.clone(cell_state = encoder_state)

#new_state = tf.contrib.seq2seq.AttentionWrapperState(cell_state = encoder_state, attention = attn_zero  , time = 0 ,alignments=None , alignment_history=())

"""out_cell = tf.contrib.rnn.OutputProjectionWrapper(
            attn_cell, target_vocab_size, reuse=True
        )"""
#end of attention 
#tensor_util.make_tensor_proto(attn_cell)
output_layer = Dense(target_vocab_size,
                     kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))

with tf.variable_scope("decode"):
    train_decoder_out = decoding_layer_train(attn_zero, attn_cell, dec_embed_input, 
                     target_sequence_length, max_target_sequence_length, output_layer, keep_prob)

with tf.variable_scope("decode", reuse=True):
    infer_decoder_out = decoding_layer_infer(attn_zero, attn_cell, dec_embeddings, 
                         target_vocab_to_int['<GO>'], target_vocab_to_int['<EOS>'], max_target_sequence_length, 
                         target_vocab_size, output_layer, batch_size, keep_prob)

return (train_decoder_out, infer_decoder_out)
def解码层(dec输入、编码器状态、,
目标序列长度,最大目标序列长度,
rnn_尺寸,
层数,目标声音到整数,目标声音大小,
批量大小、保留大小、解码大小、编码器输出):
"""
创建解码层
:param dec_输入:解码器输入
:param encoder_state:编码器状态
:param target_sequence_length:目标批次中每个序列的长度
:param max_target_sequence_length:目标序列的最大长度
:参数rnn\U大小:rnn大小
:param num_layers:层数
:param target_vocab_to_int:从目标单词转到id的字典
:param target_vocab_size:目标词汇的大小
:param batch_size:批次的大小
:参数保持概率:退出保持概率
:param解码\嵌入\大小:解码嵌入大小
:return:Tuple of(Training BasicDecodeOutput、推断BasicDecodeOutput)
"""
# 1. 解码器嵌入
dec_embeddings=tf.Variable(tf.random_uniform([target_vocab_size,decoding_embeddings_size]))
dec_embed_input=tf.nn.嵌入查找(dec_嵌入,dec_输入)
# 2. 构造解码单元
def创建单元(rnn大小):
lstm\U单元=tf.contrib.rnn.LSTMCell(rnn\U大小,
初始值设定项=tf.随机\均匀\初始值设定项(-0.1,0.1,seed=2))
drop=tf.contrib.rnn.dropoutrapper(lstm\u单元,输出保持保持保持保持保持)
回程下降
dec_cell=tf.contrib.rnn.multirncell([为范围内的(num_层)创建_cell(rnn_大小)])
#dec\u cell=tf.contrib.rnn.multirncell(cells\u a)
#注意细节
注意机制=tf.contrib.seq2seq.bahdanaauattention(num\u单位=rnn\u大小,内存=编码器输出)
attn_cell=tf.contrib.seq2seq.AttentionWrapper(dec_cell,attention_机制,attention_layer_size=rnn_size/2)
attn_zero=attn_单元格的零状态(批次大小,tf.float32)
附件零=附件零。克隆(单元状态=编码器状态)
#新状态=tf.contrib.seq2seq.AttentionWrapperState(单元格状态=编码器状态,注意=附件零,时间=0,对齐=无,对齐历史=())
“”“out\u cell=tf.contrib.rnn.OutputProjectionWrapper(
attn_单元,目标语音大小,重用=真
)"""
#注意力的终结
#使用张量生成张量原型(attn单元)
输出层=密集(目标层大小,
内核\初始值设定项=tf.截断\正常\初始值设定项(平均值=0.0,标准差=0.1))
使用tf.variable_scope(“解码”):
序列解码器输出=解码层序列(attn零、attn单元、dec嵌入输入、,
目标\序列\长度,最大\目标\序列\长度,输出\层,保持\概率)
使用tf.variable_scope(“decode”,reuse=True):
推断\u解码器\u输出=解码\u层\u推断(attn\u零、attn\u单元、dec\u嵌入、,
target_vocab_to_int[''),target_vocab_to_int['',max_target_sequence_长度,
目标(声音大小、输出层、批次大小、保留问题)
返回(训练解码器输出、推断解码器输出)
为什么要注释行#new#u state=tf.contrib.seq2seq.AttentionWrapperState(单元格#u state=encoder#u state,attn_zero,time=0,alignments=None,alignment#u history=())