Python 如何在同一层中放置多个LSTM？_Python_Tensorflow_Keras

Python 如何在同一层中放置多个LSTM？

python tensorflow keras

Python 如何在同一层中放置多个LSTM？,python,tensorflow,keras,Python,Tensorflow,Keras,我是凯拉斯的新手。我正在尝试为文档分类实现这个模型，我想使用LSTM获得句子表示。我已经在数据集上使用skip gram模型分别训练了向量表示。现在，在将每个文档转换为单独的句子，然后将每个句子转换为单独的单词，然后将每个单词转换为字典中相应的整数之后，我为每个文档提供了如下示例： [[54,32,13],[21,43,2]...[28,1,9]] 我应该将每个句子输入到LSTM以获得一个句子向量，然后我应该将每个句子向量输入到更高层的不同LSTM以获得文档表示，然后对其应用分类。我的问题在第一

我是凯拉斯的新手。我正在尝试为文档分类实现这个模型，我想使用LSTM获得句子表示。我已经在数据集上使用skip gram模型分别训练了向量表示。现在，在将每个文档转换为单独的句子，然后将每个句子转换为单独的单词，然后将每个单词转换为字典中相应的整数之后，我为每个文档提供了如下示例： [[54,32,13],[21,43,2]...[28,1,9]] 我应该将每个句子输入到LSTM以获得一个句子向量，然后我应该将每个句子向量输入到更高层的不同LSTM以获得文档表示，然后对其应用分类。我的问题在第一层。我应该如何将每个句子同时输入到每个LSTM（因此在每个时间步，每个LSTM都应该应用到每个句子的单词向量）

编辑：我刚刚使用了TimeDistributed，它似乎可以工作，但我不确定它是否符合我的要求。我在嵌入层和第一个Lstm层上使用了时间分布包装器。这是我实现的模型（非常简单）：

我对网络的理解正确吗？我的解释是：

我对嵌入层的输入是（文档、句子、单词）。我将文档填充为30个句子，并将句子填充为200个单词。我有20000个文档，所以我的输入形状是（20000,30200）。在将其送入网络后，它首先通过每个单词向量长度为300的emeding层。因此，在将嵌入层应用于第一个具有形状（1.30200）的文档之后，我得到（1,30200300），这将是时间分布LSTM的输入。然后，time distribut将使用共享wights制作30个LSTM层副本，其中每个LSTM将输出一个句子向量，然后下一个LSTM将应用于这30个句子向量。我说的对吗？

下面的例子可能就是您要寻找的，或者至少为您指明了正确的方向。这对我来说有点实验性，但我相信它的结构是正确的。它是用Tensorflow 2.0在Google Colab中创建的。第一部分是为了使处理过程可再现，但其余部分说明了使用“时间分布层”以及掩蔽和填充的一般思想。顺便说一句——我相信这与@El Sheikh（上面的第一条评论）所提供的想法类似。注意：我在这里使用了SimpleRN，但我相信这个想法也适用于LSTM。我希望这能帮助你朝着正确的方向前进

%tensorflow_version 2.x
import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/

session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
                                        inter_op_parallelism_threads=1)

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.compat.v1.set_random_seed(1234)

sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)

# The code above here is provided to make the below reproducible each time you
# run.

#
# Main code follows:

from tensorflow import keras
from tensorflow.keras import layers

# Input structure
#                Sentence1                   .....         SentenceM
#    Word11  Word21   Word31  ..... Wordn11          Word11   ....  WordnM1
#    Word12  Word22   Word32        Wordn12          Word12         WordnM2
#    Word13  Word23   Word33        Wordn13          Word13         WordnM3

# example parameters
word_vec_dimension = 3   # dimension of the embedding
sentence_representation = 4 # dimensionality of sentence vector

#
# This represents a single test document.
# Each row is a sentence and the words are represented by 3 dimensionsal 
# integer vectors.
#
raw_inputs = [ [ [1, 5, 7], [2, 6, 7] ], 
               [ [9, 6, 3], [1, 8, 2], [4, 5, 9], [8, 2, 1] ],
               [ [1, 6, 2], [4, 2, 9] ],
               [ [2, 6, 2], [8, 2, 9] ],
               [ [3, 6, 2], [2, 2, 9], [1, 6, 2] ],

]

print(raw_inputs)
# Create the model
#
# Allow for variable number of words per sentence and variable number of 
# sentences:
# Input shape(num_samples, [SentenceCount], [WordCount], word_vector_dim)
# 
# Note:  Using None for Sentence Count, and None for Word count to allow
# for variable sequences length in both these dimensions.
#
inputs = keras.Input(shape=(None, None, word_vec_dimension), name='inputlayer')
x = tf.keras.layers.Masking(mask_value=0.0)(inputs)  # Force RNNs to ignore timesteps with zero vectors.
x = tf.keras.layers.TimeDistributed(layers.SimpleRNN(sentence_representation, 
                                                     use_bias=False, 
                                                     activation=None), 
                                                     name='TD1')(x)

outputs = x
# more layers here if needed:

model = tf.keras.Model(inputs=inputs, outputs=outputs, name='Sentiment')
model.compile(optimizer='rmsprop', loss='mse', accuracy='mse' )
model.summary()

# Set up fitting calls
import numpy as np

# document 1
x_train = raw_inputs # use the dummy document for testing
# Set zeros in locations where there is no data to indicate mask to RNN's so
# they ignore that timestep.
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(x_train, 
                                                              padding='post')

print(x_train)
# Insert a dummy dimension 1 to represent the sample dimension.
padded_inputs = np.expand_dims(padded_inputs,axis=0)/1.0  # Make float type
print(padded_inputs)
print(padded_inputs.shape)

y_train = np.array([[ 1.0, 2.0, 3.0, 4.0 ]])
print(y_train.shape)

# Train model:
model.fit(padded_inputs,y_train,epochs=1)

print('get_weights:')
print(model.get_layer(name='TD1').get_weights())

print('get_predictions:')
print(model.predict(padded_inputs))

下面的例子可能是您正在寻找的，或者至少为您指明了正确的方向。这对我来说有点实验性，但我相信它的结构是正确的。它是用Tensorflow 2.0在Google Colab中创建的。第一部分是为了使处理过程可再现，但其余部分说明了使用“时间分布层”以及掩蔽和填充的一般思想。顺便说一句——我相信这与@El Sheikh（上面的第一条评论）所提供的想法类似。注意：我在这里使用了SimpleRN，但我相信这个想法也适用于LSTM。我希望这能帮助你朝着正确的方向前进

%tensorflow_version 2.x
import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/

session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
                                        inter_op_parallelism_threads=1)

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.compat.v1.set_random_seed(1234)

sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)

# The code above here is provided to make the below reproducible each time you
# run.

#
# Main code follows:

from tensorflow import keras
from tensorflow.keras import layers

# Input structure
#                Sentence1                   .....         SentenceM
#    Word11  Word21   Word31  ..... Wordn11          Word11   ....  WordnM1
#    Word12  Word22   Word32        Wordn12          Word12         WordnM2
#    Word13  Word23   Word33        Wordn13          Word13         WordnM3

# example parameters
word_vec_dimension = 3   # dimension of the embedding
sentence_representation = 4 # dimensionality of sentence vector

#
# This represents a single test document.
# Each row is a sentence and the words are represented by 3 dimensionsal 
# integer vectors.
#
raw_inputs = [ [ [1, 5, 7], [2, 6, 7] ], 
               [ [9, 6, 3], [1, 8, 2], [4, 5, 9], [8, 2, 1] ],
               [ [1, 6, 2], [4, 2, 9] ],
               [ [2, 6, 2], [8, 2, 9] ],
               [ [3, 6, 2], [2, 2, 9], [1, 6, 2] ],

]

print(raw_inputs)
# Create the model
#
# Allow for variable number of words per sentence and variable number of 
# sentences:
# Input shape(num_samples, [SentenceCount], [WordCount], word_vector_dim)
# 
# Note:  Using None for Sentence Count, and None for Word count to allow
# for variable sequences length in both these dimensions.
#
inputs = keras.Input(shape=(None, None, word_vec_dimension), name='inputlayer')
x = tf.keras.layers.Masking(mask_value=0.0)(inputs)  # Force RNNs to ignore timesteps with zero vectors.
x = tf.keras.layers.TimeDistributed(layers.SimpleRNN(sentence_representation, 
                                                     use_bias=False, 
                                                     activation=None), 
                                                     name='TD1')(x)

outputs = x
# more layers here if needed:

model = tf.keras.Model(inputs=inputs, outputs=outputs, name='Sentiment')
model.compile(optimizer='rmsprop', loss='mse', accuracy='mse' )
model.summary()

# Set up fitting calls
import numpy as np

# document 1
x_train = raw_inputs # use the dummy document for testing
# Set zeros in locations where there is no data to indicate mask to RNN's so
# they ignore that timestep.
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(x_train, 
                                                              padding='post')

print(x_train)
# Insert a dummy dimension 1 to represent the sample dimension.
padded_inputs = np.expand_dims(padded_inputs,axis=0)/1.0  # Make float type
print(padded_inputs)
print(padded_inputs.shape)

y_train = np.array([[ 1.0, 2.0, 3.0, 4.0 ]])
print(y_train.shape)

# Train model:
model.fit(padded_inputs,y_train,epochs=1)

print('get_weights:')
print(model.get_layer(name='TD1').get_weights())

print('get_predictions:')
print(model.predict(padded_inputs))

我今年早些时候创建了这个模型，请看一看，恐怕这不是我想要的答案，而且你的模型与论文中的不同。我想将文档中的每个句子输入LSTM，以获得句子表示。如果我有一个有5句话的文档，那么我需要在第一层有5个LSTM。所以在每个时间步的每个LSTM中，都会处理一个单词向量。@jalilasadi只是想澄清一下，你是说每个句子位置都应该映射到一个特定的LSTM吗？换言之，文档中的第一句话将始终输入到第一个LSTM，第二句输入到第二个LSTM等。从纸上看，我不太清楚设计是否真的有这样的内容。另一种解释可能是单个LSTM网络（具有N个输出），每个句子都应用于该网络，然后创建一系列句子表示，这些句子表示被馈送到更高级别的LSTM网络。这种观点有意义吗？我希望这能有所帮助。@ad2004我只想为文档中的每个句子都有一个LSTM！请注意，我的示例是文档，因此如果我将第一个示例提供给模型，那么我的输入就像（1，句子，单词）。我有一个预词嵌入矩阵，我认为它是嵌入层。这个嵌入层将产生一个大小为300的向量，因此在它之后，我的输入将像（1，句子，单词，300），它不能被馈送到一个正常的LSTM，因为一个正常的LSTM输入形状像（样本，步骤，特征）。我今年早些时候创建了这个模型，请看一下，我恐怕这不是我想要的答案，你的模型和论文中的不同。我想将文档中的每个句子输入LSTM，以获得句子表示。如果我有一个有5句话的文档，那么我需要在第一层有5个LSTM。所以在每个时间步的每个LSTM中，都会处理一个单词向量。@jalilasadi只是想澄清一下，你是说每个句子位置都应该映射到一个特定的LSTM吗？换言之，文档中的第一句话将始终输入到第一个LSTM，第二句输入到第二个LSTM等。从纸上看，我不太清楚设计是否真的有这样的内容。另一种解释可能是单个LSTM网络（具有N个输出），每个句子都应用于该网络，然后创建一系列句子表示，这些句子表示被馈送到更高级别的LSTM网络。这种观点有意义吗？我希望这能有所帮助。@ad2004我只想为文档中的每个句子都有一个LSTM！请注意，我的示例是文档，因此如果我将第一个示例提供给模型，那么我的输入类似于（1，sente