Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/362.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在同一层中放置多个LSTM?_Python_Tensorflow_Keras - Fatal编程技术网

Python 如何在同一层中放置多个LSTM?

Python 如何在同一层中放置多个LSTM?,python,tensorflow,keras,Python,Tensorflow,Keras,我是凯拉斯的新手。我正在尝试为文档分类实现这个模型,我想使用LSTM获得句子表示。我已经在数据集上使用skip gram模型分别训练了向量表示。现在,在将每个文档转换为单独的句子,然后将每个句子转换为单独的单词,然后将每个单词转换为字典中相应的整数之后,我为每个文档提供了如下示例: [[54,32,13],[21,43,2]...[28,1,9]] 我应该将每个句子输入到LSTM以获得一个句子向量,然后我应该将每个句子向量输入到更高层的不同LSTM以获得文档表示,然后对其应用分类。我的问题在第一

我是凯拉斯的新手。我正在尝试为文档分类实现这个模型,我想使用LSTM获得句子表示。我已经在数据集上使用skip gram模型分别训练了向量表示。现在,在将每个文档转换为单独的句子,然后将每个句子转换为单独的单词,然后将每个单词转换为字典中相应的整数之后,我为每个文档提供了如下示例: [[54,32,13],[21,43,2]...[28,1,9]] 我应该将每个句子输入到LSTM以获得一个句子向量,然后我应该将每个句子向量输入到更高层的不同LSTM以获得文档表示,然后对其应用分类。我的问题在第一层。我应该如何将每个句子同时输入到每个LSTM(因此在每个时间步,每个LSTM都应该应用到每个句子的单词向量)

编辑:我刚刚使用了TimeDistributed,它似乎可以工作,但我不确定它是否符合我的要求。我在嵌入层和第一个Lstm层上使用了时间分布包装器。这是我实现的模型(非常简单):

我对网络的理解正确吗? 我的解释是:
我对嵌入层的输入是(文档、句子、单词)。我将文档填充为30个句子,并将句子填充为200个单词。我有20000个文档,所以我的输入形状是(20000,30200)。在将其送入网络后,它首先通过每个单词向量长度为300的emeding层。因此,在将嵌入层应用于第一个具有形状(1.30200)的文档之后,我得到(1,30200300),这将是时间分布LSTM的输入。然后,time distribut将使用共享wights制作30个LSTM层副本,其中每个LSTM将输出一个句子向量,然后下一个LSTM将应用于这30个句子向量。我说的对吗?

下面的例子可能就是您要寻找的,或者至少为您指明了正确的方向。这对我来说有点实验性,但我相信它的结构是正确的。它是用Tensorflow 2.0在Google Colab中创建的。第一部分是为了使处理过程可再现,但其余部分说明了使用“时间分布层”以及掩蔽和填充的一般思想。顺便说一句——我相信这与@El Sheikh(上面的第一条评论)所提供的想法类似。注意:我在这里使用了SimpleRN,但我相信这个想法也适用于LSTM。我希望这能帮助你朝着正确的方向前进

%tensorflow_version 2.x
import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/

session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
                                        inter_op_parallelism_threads=1)

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.compat.v1.set_random_seed(1234)

sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)

# The code above here is provided to make the below reproducible each time you
# run.

#
# Main code follows:

from tensorflow import keras
from tensorflow.keras import layers

# Input structure
#                Sentence1                   .....         SentenceM
#    Word11  Word21   Word31  ..... Wordn11          Word11   ....  WordnM1
#    Word12  Word22   Word32        Wordn12          Word12         WordnM2
#    Word13  Word23   Word33        Wordn13          Word13         WordnM3

# example parameters
word_vec_dimension = 3   # dimension of the embedding
sentence_representation = 4 # dimensionality of sentence vector

#
# This represents a single test document.
# Each row is a sentence and the words are represented by 3 dimensionsal 
# integer vectors.
#
raw_inputs = [ [ [1, 5, 7], [2, 6, 7] ], 
               [ [9, 6, 3], [1, 8, 2], [4, 5, 9], [8, 2, 1] ],
               [ [1, 6, 2], [4, 2, 9] ],
               [ [2, 6, 2], [8, 2, 9] ],
               [ [3, 6, 2], [2, 2, 9], [1, 6, 2] ],

]

print(raw_inputs)
# Create the model
#
# Allow for variable number of words per sentence and variable number of 
# sentences:
# Input shape(num_samples, [SentenceCount], [WordCount], word_vector_dim)
# 
# Note:  Using None for Sentence Count, and None for Word count to allow
# for variable sequences length in both these dimensions.
#
inputs = keras.Input(shape=(None, None, word_vec_dimension), name='inputlayer')
x = tf.keras.layers.Masking(mask_value=0.0)(inputs)  # Force RNNs to ignore timesteps with zero vectors.
x = tf.keras.layers.TimeDistributed(layers.SimpleRNN(sentence_representation, 
                                                     use_bias=False, 
                                                     activation=None), 
                                                     name='TD1')(x)

outputs = x
# more layers here if needed:

model = tf.keras.Model(inputs=inputs, outputs=outputs, name='Sentiment')
model.compile(optimizer='rmsprop', loss='mse', accuracy='mse' )
model.summary()

# Set up fitting calls
import numpy as np

# document 1
x_train = raw_inputs # use the dummy document for testing
# Set zeros in locations where there is no data to indicate mask to RNN's so
# they ignore that timestep.
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(x_train, 
                                                              padding='post')

print(x_train)
# Insert a dummy dimension 1 to represent the sample dimension.
padded_inputs = np.expand_dims(padded_inputs,axis=0)/1.0  # Make float type
print(padded_inputs)
print(padded_inputs.shape)

y_train = np.array([[ 1.0, 2.0, 3.0, 4.0 ]])
print(y_train.shape)

# Train model:
model.fit(padded_inputs,y_train,epochs=1)

print('get_weights:')
print(model.get_layer(name='TD1').get_weights())

print('get_predictions:')
print(model.predict(padded_inputs))

下面的例子可能是您正在寻找的,或者至少为您指明了正确的方向。这对我来说有点实验性,但我相信它的结构是正确的。它是用Tensorflow 2.0在Google Colab中创建的。第一部分是为了使处理过程可再现,但其余部分说明了使用“时间分布层”以及掩蔽和填充的一般思想。顺便说一句——我相信这与@El Sheikh(上面的第一条评论)所提供的想法类似。注意:我在这里使用了SimpleRN,但我相信这个想法也适用于LSTM。我希望这能帮助你朝着正确的方向前进

%tensorflow_version 2.x
import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/

session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
                                        inter_op_parallelism_threads=1)

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.compat.v1.set_random_seed(1234)

sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)

# The code above here is provided to make the below reproducible each time you
# run.

#
# Main code follows:

from tensorflow import keras
from tensorflow.keras import layers

# Input structure
#                Sentence1                   .....         SentenceM
#    Word11  Word21   Word31  ..... Wordn11          Word11   ....  WordnM1
#    Word12  Word22   Word32        Wordn12          Word12         WordnM2
#    Word13  Word23   Word33        Wordn13          Word13         WordnM3

# example parameters
word_vec_dimension = 3   # dimension of the embedding
sentence_representation = 4 # dimensionality of sentence vector

#
# This represents a single test document.
# Each row is a sentence and the words are represented by 3 dimensionsal 
# integer vectors.
#
raw_inputs = [ [ [1, 5, 7], [2, 6, 7] ], 
               [ [9, 6, 3], [1, 8, 2], [4, 5, 9], [8, 2, 1] ],
               [ [1, 6, 2], [4, 2, 9] ],
               [ [2, 6, 2], [8, 2, 9] ],
               [ [3, 6, 2], [2, 2, 9], [1, 6, 2] ],

]

print(raw_inputs)
# Create the model
#
# Allow for variable number of words per sentence and variable number of 
# sentences:
# Input shape(num_samples, [SentenceCount], [WordCount], word_vector_dim)
# 
# Note:  Using None for Sentence Count, and None for Word count to allow
# for variable sequences length in both these dimensions.
#
inputs = keras.Input(shape=(None, None, word_vec_dimension), name='inputlayer')
x = tf.keras.layers.Masking(mask_value=0.0)(inputs)  # Force RNNs to ignore timesteps with zero vectors.
x = tf.keras.layers.TimeDistributed(layers.SimpleRNN(sentence_representation, 
                                                     use_bias=False, 
                                                     activation=None), 
                                                     name='TD1')(x)

outputs = x
# more layers here if needed:

model = tf.keras.Model(inputs=inputs, outputs=outputs, name='Sentiment')
model.compile(optimizer='rmsprop', loss='mse', accuracy='mse' )
model.summary()

# Set up fitting calls
import numpy as np

# document 1
x_train = raw_inputs # use the dummy document for testing
# Set zeros in locations where there is no data to indicate mask to RNN's so
# they ignore that timestep.
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(x_train, 
                                                              padding='post')

print(x_train)
# Insert a dummy dimension 1 to represent the sample dimension.
padded_inputs = np.expand_dims(padded_inputs,axis=0)/1.0  # Make float type
print(padded_inputs)
print(padded_inputs.shape)

y_train = np.array([[ 1.0, 2.0, 3.0, 4.0 ]])
print(y_train.shape)

# Train model:
model.fit(padded_inputs,y_train,epochs=1)

print('get_weights:')
print(model.get_layer(name='TD1').get_weights())

print('get_predictions:')
print(model.predict(padded_inputs))

我今年早些时候创建了这个模型,请看一看,恐怕这不是我想要的答案,而且你的模型与论文中的不同。我想将文档中的每个句子输入LSTM,以获得句子表示。如果我有一个有5句话的文档,那么我需要在第一层有5个LSTM。所以在每个时间步的每个LSTM中,都会处理一个单词向量。@jalilasadi只是想澄清一下,你是说每个句子位置都应该映射到一个特定的LSTM吗?换言之,文档中的第一句话将始终输入到第一个LSTM,第二句输入到第二个LSTM等。从纸上看,我不太清楚设计是否真的有这样的内容。另一种解释可能是单个LSTM网络(具有N个输出),每个句子都应用于该网络,然后创建一系列句子表示,这些句子表示被馈送到更高级别的LSTM网络。这种观点有意义吗?我希望这能有所帮助。@ad2004我只想为文档中的每个句子都有一个LSTM!请注意,我的示例是文档,因此如果我将第一个示例提供给模型,那么我的输入就像(1,句子,单词)。我有一个预词嵌入矩阵,我认为它是嵌入层。这个嵌入层将产生一个大小为300的向量,因此在它之后,我的输入将像(1,句子,单词,300),它不能被馈送到一个正常的LSTM,因为一个正常的LSTM输入形状像(样本,步骤,特征)。我今年早些时候创建了这个模型,请看一下,我恐怕这不是我想要的答案,你的模型和论文中的不同。我想将文档中的每个句子输入LSTM,以获得句子表示。如果我有一个有5句话的文档,那么我需要在第一层有5个LSTM。所以在每个时间步的每个LSTM中,都会处理一个单词向量。@jalilasadi只是想澄清一下,你是说每个句子位置都应该映射到一个特定的LSTM吗?换言之,文档中的第一句话将始终输入到第一个LSTM,第二句输入到第二个LSTM等。从纸上看,我不太清楚设计是否真的有这样的内容。另一种解释可能是单个LSTM网络(具有N个输出),每个句子都应用于该网络,然后创建一系列句子表示,这些句子表示被馈送到更高级别的LSTM网络。这种观点有意义吗?我希望这能有所帮助。@ad2004我只想为文档中的每个句子都有一个LSTM!请注意,我的示例是文档,因此如果我将第一个示例提供给模型,那么我的输入类似于(1,sente