Python 通过在Tensorflow中获得所有单词嵌入的平均值来获得句子嵌入？_Python_Tensorflow

Python 通过在Tensorflow中获得所有单词嵌入的平均值来获得句子嵌入？

python tensorflow

Python 通过在Tensorflow中获得所有单词嵌入的平均值来获得句子嵌入？,python,tensorflow,Python,Tensorflow,下面是我的代码，用于使用tf.string类型拆分输入张量，并使用预先训练的手套模型提取每个单词嵌入。但是，我在cond实现方面遇到了不必要的错误。我想知道是否有更干净的方法来获得字符串张量中所有单词的嵌入 # Take out the words target_words = tf.string_split([target_sentence], delimiter=" ") # Tensorflow parallel while loop variable, condition and bo

下面是我的代码，用于使用tf.string类型拆分输入张量，并使用预先训练的手套模型提取每个单词嵌入。但是，我在cond实现方面遇到了不必要的错误。我想知道是否有更干净的方法来获得字符串张量中所有单词的嵌入

# Take out the words target_words = tf.string_split([target_sentence], delimiter=" ") # Tensorflow parallel while loop variable, condition and body i = tf.constant(0, dtype=tf.int32) cond = lambda self, i: tf.less(x=tf.cast(i, tf.int32), y=tf.cast(tf.shape(target_words)[0], tf.int32)) sentence_mean_embedding = tf.Variable([], trainable=False) def body(i, sentence_mean_embedding): sentence_mean_embedding = tf.concat(1, tf.nn.embedding_lookup(params=tf_embedding, ids=tf.gather(target_words, i))) return sentence_mean_embedding embedding_sentence = tf.reduce_mean(tf.while_loop(cond, body, [i, sentence_mean_embedding]))

有一种更干净的方法来处理和
首先，创建您自己的
tf.Dataset
（我假设我们有两个带有任意标签的句子）：
Second，创建一个
vocab.txt
文件，该文件中的每一行号都映射到
手套中的同一索引。例如，如果手套中的第一个词汇在vocab.txt 中为“缺席”，则第一行应为“缺席”，依此类推。为简单起见，假设我们的vocab.txt 包含以下单词： first is test this second sentence 然后，根据定义一个表，其目标是将每个单词转换为特定id： table = tf.contrib.lookup.index_table_from_file(vocabulary_file="vocab.txt", num_oov_buckets=1) dataset = dataset.map(lambda x, y: (tf.string_split([x]).values, y)) dataset = dataset.map(lambda x, y: (tf.cast(table.lookup(x), tf.int32), y)) dataset = dataset.batch(1) 最后，基于，通过使用将每个句子转换为嵌入： glove_weights = tf.get_variable('embed', shape=embedding.shape, initializer=initializer=tf.constant_initializer(embedding), trainable=False) iterator = dataset.make_initializable_iterator() x, y = iterator.get_next() embedding = tf.nn.embedding_lookup(glove_weights, x) sentence = tf.reduce_mean(embedding, axis=1) 在急切模式下完成代码： import tensorflow as tf tf.enable_eager_execution() sentence = tf.constant(['this is first sentence', 'this is second sentence']) labels = tf.constant([1, 0]) dataset = tf.data.Dataset.from_tensor_slices((sentence, labels)) table = tf.contrib.lookup.index_table_from_file(vocabulary_file="vocab.txt", num_oov_buckets=1) dataset = dataset.map(lambda x, y: (tf.string_split([x]).values, y)) dataset = dataset.map(lambda x, y: (tf.cast(table.lookup(x), tf.int32), y)) dataset = dataset.batch(1) glove_weights = tf.get_variable('embed', shape=(10000, 300), initializer=tf.truncated_normal_initializer()) for x, y in dataset: embedding = tf.nn.embedding_lookup(glove_weights, x) sentence = tf.reduce_mean(embedding, axis=1) print(sentence.shape) 有一种更干净的方法来处理和首先，创建您自己的tf.Dataset （我假设我们有两个带有任意标签的句子）： Second，创建一个vocab.txt 文件，该文件中的每一行号都映射到手套中的同一索引。例如，如果手套中的第一个词汇在vocab.txt 中为“缺席”，则第一行应为“缺席”，依此类推。为简单起见，假设我们的vocab.txt 包含以下单词： first is test this second sentence 然后，根据定义一个表，其目标是将每个单词转换为特定id： table = tf.contrib.lookup.index_table_from_file(vocabulary_file="vocab.txt", num_oov_buckets=1) dataset = dataset.map(lambda x, y: (tf.string_split([x]).values, y)) dataset = dataset.map(lambda x, y: (tf.cast(table.lookup(x), tf.int32), y)) dataset = dataset.batch(1) 最后，基于，通过使用将每个句子转换为嵌入： glove_weights = tf.get_variable('embed', shape=embedding.shape, initializer=initializer=tf.constant_initializer(embedding), trainable=False) iterator = dataset.make_initializable_iterator() x, y = iterator.get_next() embedding = tf.nn.embedding_lookup(glove_weights, x) sentence = tf.reduce_mean(embedding, axis=1) 在急切模式下完成代码： import tensorflow as tf tf.enable_eager_execution() sentence = tf.constant(['this is first sentence', 'this is second sentence']) labels = tf.constant([1, 0]) dataset = tf.data.Dataset.from_tensor_slices((sentence, labels)) table = tf.contrib.lookup.index_table_from_file(vocabulary_file="vocab.txt", num_oov_buckets=1) dataset = dataset.map(lambda x, y: (tf.string_split([x]).values, y)) dataset = dataset.map(lambda x, y: (tf.cast(table.lookup(x), tf.int32), y)) dataset = dataset.batch(1) glove_weights = tf.get_variable('embed', shape=(10000, 300), initializer=tf.truncated_normal_initializer()) for x, y in dataset: embedding = tf.nn.embedding_lookup(glove_weights, x) sentence = tf.reduce_mean(embedding, axis=1) print(sentence.shape) ids：类型为int32或int64的张量，包含要在参数中查找的ID。我忽略了ids是各种ID（列表）的张量这一事实。因此，我试图一个接一个地重复这个句子。通过ID使用tf.string_split传递一个矢量化句子就足够了。ids：一个类型为int32或int64的张量，包含要在参数中查找的ID。我忽略了一个事实，即ids是各种ID（列表）的张量。因此，我试图一个接一个地重复这个句子。通过ID使用tf.string_split传递矢量化语句就足够了。