Python 在窗口分类上使用Tensorflow时未更新嵌入向量_Python_Tensorflow_Deep Learning

Python 在窗口分类上使用Tensorflow时未更新嵌入向量

python tensorflow deep-learning

Python 在窗口分类上使用Tensorflow时未更新嵌入向量,python,tensorflow,deep-learning,Python,Tensorflow,Deep Learning,我正在尝试用tensorflow实现一个基于窗口的分类器单词嵌入矩阵称为word\u vec，并随机初始化（我也尝试了Xavier）而ind变量是来自矩阵的单词向量索引的向量第一层是连接的config['window\u size']（5）字向量 word_vecs = tf.Variable(tf.random_uniform([len(words), config['embed_size']], -1.0, 1.0),dtype=tf.float32) ind = tf.placeho

我正在尝试用tensorflow实现一个基于窗口的分类器

单词嵌入矩阵称为

word\u vec

，并随机初始化（我也尝试了Xavier）

而

ind

变量是来自矩阵的单词向量索引的向量

第一层是连接的

config['window\u size']

（5）字向量

word_vecs = tf.Variable(tf.random_uniform([len(words), config['embed_size']], -1.0, 1.0),dtype=tf.float32)
ind = tf.placeholder(tf.int32,  [None, config['window_size']])
x = tf.concat(1,tf.unpack(tf.nn.embedding_lookup(word_vecs, ind),axis=1))
W0 = tf.Variable(tf.random_uniform([config['window_size']*config['embed_size'], config['hidden_layer']]))
b0 = tf.Variable(tf.zeros([config['hidden_layer']]))
W1 = tf.Variable(tf.random_uniform([config['hidden_layer'], out_layer]))
b1 = tf.Variable(tf.zeros([out_layer]))
y0 = tf.nn.tanh(tf.matmul(x, W0) + b0)
y1 = tf.nn.softmax(tf.matmul(y0, W1) + b1)
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y1), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(0.5).minimize(cross_entropy)

这就是我运行图表的方式：

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(config['iterations'] ):
    r = random.randint(0,len(sentences)-1)
    inds=generate_windows([w for w,t in sentences[r]])
    #inds now contains an array of n rows on window_size columns
    ys=[one_hot(tags.index(t),len(tags)) for w,t in sentences[r]]
    #ys now contains an array of n rows on output_size columns
    sess.run(train_step, feed_dict={ind: inds, y_: ys})

维度计算出来，代码运行

然而，准确度接近于零，我怀疑单词向量没有被正确更新

如何使tensorflow从连接的窗口表单更新回单词向量？

您需要将

矩阵初始化为随机值。

现在由于初始化为零，

y1

始终为0。

您的嵌入是使用默认可训练的

tf.Variable来初始化的。它们将被更新。问题可能在于你计算损失的方式。看下面几行
y1 = tf.nn.softmax(tf.matmul(y0, W1) + b1)
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y1), reduction_indices=[1])) 

这里您正在计算softmax函数，该函数将分数转换为概率

如果这里的分母变得太大或太小，那么这个函数可以进行折腾。为了避免这种数值不稳定性，通常会添加一个小的ε，如下所示。这确保了数值稳定性

您可以看到，即使在添加epsilon
之后，softmax functions值仍然保持不变。如果不自行处理此问题，则由于渐变消失或爆炸，渐变可能无法正确更新
避免使用三行代码，使用tensorflow版本
tf.nn.sparse\u softmax\u cross\u entropy\u with\u logits

请注意，此函数将在内部计算softmax函数。
建议使用此方法，而不是手动计算损失。您可以按如下方式使用它
y1 = tf.matmul(y0, W1) + b1
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y1, labels=y_))

你的算法开始很好。但我有信心这种方法不起作用。事实上，在估计近似被发现适用于NLP之后，字到向量的技巧开始发挥作用。例如，称为重要性抽样和噪声对比估计的技术
那么，为什么直接方法不起作用呢？我认为，要解决任务模型，必须准确地从大量词汇中找到正确的答案，比如说80000个单词。1从80000-是太难优化模型，梯度并没有告诉大多数情况下任何事情
更新：
我忘了提到估算近似值的主要原因是，如果您有较大的输出，则直线法的性能问题。所有示例的每个迭代步骤都需要计算每个输出单元的损失（如80000）。优化需要很长时间才能解决
如何使用采样和NCE损耗实现正确的word2vec？很容易，根据教程，损失函数如下所示：
loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
                           labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

主要思想是我们只需要少数m个阴性样本和1个阳性样本。其中m远小于实际的词汇量
Tensorflow也有
你可以在在线书籍（I.Goodfello等人）中阅读更多关于数学方法背后的信息。
谢谢你的回答，我现在用tf.random_uniform
在0和1之间初始化了Ws和Bs，结果没有改善，知道为什么吗？谢谢你的回答，我实际上在训练一个POS-tagger（不是word2vec）对于基于窗口的模型，NCE损失是否也适用于这里？哦，这已经澄清了。我不确定，NCE和抽样低估了完整的softmax。如果标签数量相对较少，则首选使用常规softmax。您的代码现在似乎可以工作了：）为了调试，mb有助于监视渐变，使用摘要操作激活。您是否正在输入模型“正确”数据？：）