Tensorflow 损失函数没有';当输入步长可变时,在RNN中不会下降

Tensorflow 损失函数没有';当输入步长可变时,在RNN中不会下降,tensorflow,machine-learning,recurrent-neural-network,Tensorflow,Machine Learning,Recurrent Neural Network,更新:我已经重新编写了问题并删除了不必要的代码。感谢您的善意和及时的评论 我想使用RNN进行基于tensorflow的序列预测。具体来说,我使用了一组句子作为基于tensorflow的RNN中的训练集。对于一个示例,即一个句子,输出(标签)是句子的最后一个单词,输入是句子的左侧部分 我用几个句子进行训练。经过训练后,我使用训练集的输入作为预测下一个单词的输入 当句子(系列)的长度相同时,模型运行良好。没问题 当句子(序列)的长度不同时,我使用tensorflow中的参数sequence_leng

更新:我已经重新编写了问题并删除了不必要的代码。感谢您的善意和及时的评论

我想使用RNN进行基于tensorflow的序列预测。具体来说,我使用了一组句子作为基于tensorflow的RNN中的训练集。对于一个示例,即一个句子,输出(标签)是句子的最后一个单词,输入是句子的左侧部分

我用几个句子进行训练。经过训练后,我使用训练集的输入作为预测下一个单词的输入

当句子(系列)的长度相同时,模型运行良好。没问题

当句子(序列)的长度不同时,我使用tensorflow中的参数sequence_length来启用可变长度序列预测,如下所示

tf.nn.dynamic_rnn(cell, X, dtype=tf.float32, sequence_length=seq_length)
但它不起作用。损失函数保持在一个相对较高的值,不能降到接近0。预测结果很差

代码如下

import jieba
import numpy as np
import tensorflow as tf

# sentences for training
sentences = ["大家讨厌狗", "我讨厌蜘蛛", "他喜欢狗", "他语文好", "我喜欢猫", "他名字是小明"]

word_list = []
for sen in sentences:
    sent = list(jieba.cut(sen))
    print(sent)
    word_list += sent  # split Chinese sentences to words
word_list = list(set(word_list))  # remove the duplicated words

# word:id
word_dict = {w: i for i, w in enumerate(word_list)}  # word2id
print(word_dict)

# id:word
number_dict = {i: w for i, w in enumerate(word_list)}  # id2word

# total number of words
n_class = len(word_dict)
print(n_class)


n_hidden = 10           # number of units in hidden layer


# get batch of the input and output
def make_batch(sentences):
    input_batch = []
    target_batch = []

    inputs = []
    targets = []

    for sen in sentences:
        word = list(jieba.cut(sen))
        input = [word_dict[n] for n in word[:-1]]
        target = word_dict[word[-1]]

        inputs.append(input)
        targets.append(target)

    input_length = [len(x) for x in inputs]
    print(input_length)
    max_input_length = max(input_length)

    tot = 0
    for input, target in zip(inputs, targets):
        # one-hot encoding
        x = np.eye(n_class)[input]

        # make sure the inputs are the same length in style by appending extra zeros to the shorter inputs.
        if input_length[tot] < max_input_length:
            for k in range(max_input_length-input_length[tot]):
                x = np.append(x, [np.zeros(n_class)], axis=0)

        input_batch.append(x)
        target_batch.append(np.eye(n_class)[target])
        tot += 1

    return input_batch, target_batch, input_length, max_input_length


input_batch, target_batch, input_length, max_input_length = make_batch(sentences)
n_step = max_input_length             # time step (series length)

# a function that transfers X to a word
def transferX(input_batch):
    for i in range(len(input_batch)):
        L1 = list(input_batch[i])
        for x in L1:
            for j, e in enumerate(list(x)):
                if e != 0:
                    print(number_dict[j], end=' ')
            print()
    print('######################################')

# a function that transfers Y to a word
def transferY(target_batch):
    for i in range(len(target_batch)):
        L2 = list(target_batch[i])
        for j, e in enumerate(L2):
            if e != 0:
                print(number_dict[j], end='  ')
    print('######################################')


# transferX(input_batch)
# transferY(target_batch)

# set placeholders for input X and output Y
X = tf.placeholder(tf.float32, [None, n_step, n_class])  # [batch_size, n_step, n_class]
Y = tf.placeholder(tf.float32, [None, n_class])  # [batch_size, n_class]

# set weights and biases
W = tf.Variable(tf.random_normal([n_hidden, n_class]))
b = tf.Variable(tf.random_normal([n_class]))

seq_length = tf.placeholder(tf.int32, [None])

# define the hidden layer which has n_hidden units
cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)

outputs_, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32, sequence_length=seq_length)

# outputs = contrib.layers.fully_connected(inputs=outputs_, num_outputs=n_class, activation_fn=None)
outputs1 = tf.transpose(outputs_, [1, 0, 2])  # [n_step, batch_size, n_hidden]  2*3*5
outputs = outputs1[-1]  # [batch_size, n_hidden]             3*5

# define the model
model = tf.matmul(outputs, W) + b  # model : [batch_size, n_class]           3*7

# define cross entropy as the cost
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=model, labels=Y))
# set Adam as the optimizer
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)

# the prediction result
prediction = tf.cast(tf.argmax(model, 1), tf.int32) # get the class id with the highest probability
# prediction = model  # get all the classes and the corresponding probabilities

# get a session of tensorflow
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

seq_length_batch = np.array(input_length)
print('the real length of each sample', seq_length_batch)


# training
for epoch in range(5000):
    _, loss = sess.run([optimizer, cost], feed_dict={X: input_batch, Y: target_batch, seq_length: seq_length_batch})
    if (epoch + 1) % 1000 == 0:
        print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

# use the trained model to predict the next word (the last word of a sentence).
# The input is the input of the training set.
result = sess.run(prediction, feed_dict={X: input_batch, seq_length: seq_length_batch})

# We should get the exact right word, hopefully. But we don't.
print(result)
print([number_dict[x] for x in result])
感谢您抽出时间:)

更新:当我将上面代码中设置的最大训练步骤从5000更改为50000时,结果如下。你可以看到损失持续下降,但速度之慢令人无法接受

Loading model cost 0.627 seconds.
['大家', '讨厌', '狗']
Prefix dict has been built succesfully.
['我', '讨厌', '蜘蛛']
['他', '喜欢', '狗']
['他', '语文', '好']
['我', '喜欢', '猫']
['他', '名字', '是', '小明']
{'大家': 0, '语文': 2, '小明': 4, '是': 6, '讨厌': 8, '喜欢': 9, '我': 10, '名字': 12, '他': 11, '猫': 5, '狗': 7, '好': 3, '蜘蛛': 1}
13
[2, 2, 2, 2, 2, 3]
2019-12-03 09:01:28.925616: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
the real length of each sample [2 2 2 2 2 3]
Epoch: 1000 cost = 1.449701
Epoch: 2000 cost = 1.278301
Epoch: 3000 cost = 1.219589
Epoch: 4000 cost = 1.191665
Epoch: 5000 cost = 1.175023
Epoch: 6000 cost = 1.163936
Epoch: 7000 cost = 1.156038
Epoch: 8000 cost = 1.150135
Epoch: 9000 cost = 1.145562
Epoch: 10000 cost = 1.141918
Epoch: 11000 cost = 1.138948
Epoch: 12000 cost = 1.136482
Epoch: 13000 cost = 1.134402
Epoch: 14000 cost = 1.132625
Epoch: 15000 cost = 1.131089
Epoch: 16000 cost = 1.129749
Epoch: 17000 cost = 1.128569
Epoch: 18000 cost = 1.127523
Epoch: 19000 cost = 1.126589
Epoch: 20000 cost = 1.125750
Epoch: 21000 cost = 1.124992
Epoch: 22000 cost = 1.124305
Epoch: 23000 cost = 1.123678
Epoch: 24000 cost = 1.123104
Epoch: 25000 cost = 1.122577
Epoch: 26000 cost = 1.122091
Epoch: 27000 cost = 1.121641
Epoch: 28000 cost = 1.121225
Epoch: 29000 cost = 1.120837
Epoch: 30000 cost = 1.120475
Epoch: 31000 cost = 1.120138
Epoch: 32000 cost = 1.119821
Epoch: 33000 cost = 1.119524
Epoch: 34000 cost = 1.119245
Epoch: 35000 cost = 1.118982
Epoch: 36000 cost = 1.118734
Epoch: 37000 cost = 1.118499
Epoch: 38000 cost = 1.118277
Epoch: 39000 cost = 1.118066
Epoch: 40000 cost = 1.117866
Epoch: 41000 cost = 1.117676
Epoch: 42000 cost = 1.117495
Epoch: 43000 cost = 1.117323
Epoch: 44000 cost = 1.117159
Epoch: 45000 cost = 1.117001
Epoch: 46000 cost = 1.116851
Epoch: 47000 cost = 1.116707
Epoch: 48000 cost = 1.116570
Epoch: 49000 cost = 1.116438
Epoch: 50000 cost = 1.116311
[7 7 7 7 7 4]
['狗', '狗', '狗', '狗', '狗', '小明']

Process finished with exit code 0
此外,我删除了参数sequence_length,并用填充零填充较短的句子。损失很容易降到接近0。模型预测结果准确。结果如下

Loading model cost 0.601 seconds.
Prefix dict has been built succesfully.
['大家', '讨厌', '狗']
['我', '讨厌', '蜘蛛']
['他', '喜欢', '狗']
['他', '语文', '好']
['我', '喜欢', '猫']
['他', '名字', '是', '小明']
{'蜘蛛': 0, '喜欢': 3, '狗': 2, '名字': 4, '猫': 6, '是': 5, '小明': 7, '大家': 8, '讨厌': 9, '好': 10, '他': 1, '语文': 11, '我': 12}
13
[2, 2, 2, 2, 2, 3]
2019-12-03 09:08:53.827053: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
the real length of each sample [2 2 2 2 2 3]
Epoch: 1000 cost = 0.066504
Epoch: 2000 cost = 0.028987
Epoch: 3000 cost = 0.018541
Epoch: 4000 cost = 0.013653
Epoch: 5000 cost = 0.010820
Epoch: 6000 cost = 0.008968
Epoch: 7000 cost = 0.007663
Epoch: 8000 cost = 0.006693
Epoch: 9000 cost = 0.005943
Epoch: 10000 cost = 0.005346
Epoch: 11000 cost = 0.004859
Epoch: 12000 cost = 0.004455
Epoch: 13000 cost = 0.004113
Epoch: 14000 cost = 0.003820
Epoch: 15000 cost = 0.003567
Epoch: 16000 cost = 0.003345
Epoch: 17000 cost = 0.003150
Epoch: 18000 cost = 0.002976
Epoch: 19000 cost = 0.002821
Epoch: 20000 cost = 0.002681
Epoch: 21000 cost = 0.002555
Epoch: 22000 cost = 0.002440
Epoch: 23000 cost = 0.002335
Epoch: 24000 cost = 0.002239
Epoch: 25000 cost = 0.002150
Epoch: 26000 cost = 0.002069
Epoch: 27000 cost = 0.001993
Epoch: 28000 cost = 0.001922
Epoch: 29000 cost = 0.001857
Epoch: 30000 cost = 0.001796
Epoch: 31000 cost = 0.001739
Epoch: 32000 cost = 0.001685
Epoch: 33000 cost = 0.001635
Epoch: 34000 cost = 0.001587
Epoch: 35000 cost = 0.001542
Epoch: 36000 cost = 0.001500
Epoch: 37000 cost = 0.001460
Epoch: 38000 cost = 0.001422
Epoch: 39000 cost = 0.001386
Epoch: 40000 cost = 0.001352
Epoch: 41000 cost = 0.001319
Epoch: 42000 cost = 0.001288
Epoch: 43000 cost = 0.001259
Epoch: 44000 cost = 0.001231
Epoch: 45000 cost = 0.001204
Epoch: 46000 cost = 0.001178
Epoch: 47000 cost = 0.001153
Epoch: 48000 cost = 0.001129
Epoch: 49000 cost = 0.001107
Epoch: 50000 cost = 0.001085
[ 2  0  2 10  6  7]
['狗', '蜘蛛', '狗', '好', '猫', '小明']

它工作正常,但这样做是不合理的。因为在不使用参数sequence_length的情况下填充填充零意味着额外的零将被视为正常零,这具有实际意义。数字0代表一个单词。例如,当我这样做时,句子“大家讨厌狗" 将被视为“大家讨厌?狗“where the question mark”?“where the question mark”?“where the question mark”?“where the question mark”?“where the question mark”?“where the question mark”?“where the question mark”?“where the question of the word id2word[0]”请去掉代码中所有不必要的部分,并翻译或删除非英语的注释。与问题line使用RNN的
time\u major==True
参数。另一点是,
0.01
Adam的学习率在我脑中引起了一个警报。你尝试过默认值吗?@thushv89我将学习率设置为0.001和0.0001,问题仍然存在。@dow tensory,你为什么认为损失不会发生n?我可以看到它在下降。当使用可变长度时,你不应该期望损失以同样的方式下降,因为现在优化的方式不同了(例如忽略零)。
Loading model cost 0.601 seconds.
Prefix dict has been built succesfully.
['大家', '讨厌', '狗']
['我', '讨厌', '蜘蛛']
['他', '喜欢', '狗']
['他', '语文', '好']
['我', '喜欢', '猫']
['他', '名字', '是', '小明']
{'蜘蛛': 0, '喜欢': 3, '狗': 2, '名字': 4, '猫': 6, '是': 5, '小明': 7, '大家': 8, '讨厌': 9, '好': 10, '他': 1, '语文': 11, '我': 12}
13
[2, 2, 2, 2, 2, 3]
2019-12-03 09:08:53.827053: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
the real length of each sample [2 2 2 2 2 3]
Epoch: 1000 cost = 0.066504
Epoch: 2000 cost = 0.028987
Epoch: 3000 cost = 0.018541
Epoch: 4000 cost = 0.013653
Epoch: 5000 cost = 0.010820
Epoch: 6000 cost = 0.008968
Epoch: 7000 cost = 0.007663
Epoch: 8000 cost = 0.006693
Epoch: 9000 cost = 0.005943
Epoch: 10000 cost = 0.005346
Epoch: 11000 cost = 0.004859
Epoch: 12000 cost = 0.004455
Epoch: 13000 cost = 0.004113
Epoch: 14000 cost = 0.003820
Epoch: 15000 cost = 0.003567
Epoch: 16000 cost = 0.003345
Epoch: 17000 cost = 0.003150
Epoch: 18000 cost = 0.002976
Epoch: 19000 cost = 0.002821
Epoch: 20000 cost = 0.002681
Epoch: 21000 cost = 0.002555
Epoch: 22000 cost = 0.002440
Epoch: 23000 cost = 0.002335
Epoch: 24000 cost = 0.002239
Epoch: 25000 cost = 0.002150
Epoch: 26000 cost = 0.002069
Epoch: 27000 cost = 0.001993
Epoch: 28000 cost = 0.001922
Epoch: 29000 cost = 0.001857
Epoch: 30000 cost = 0.001796
Epoch: 31000 cost = 0.001739
Epoch: 32000 cost = 0.001685
Epoch: 33000 cost = 0.001635
Epoch: 34000 cost = 0.001587
Epoch: 35000 cost = 0.001542
Epoch: 36000 cost = 0.001500
Epoch: 37000 cost = 0.001460
Epoch: 38000 cost = 0.001422
Epoch: 39000 cost = 0.001386
Epoch: 40000 cost = 0.001352
Epoch: 41000 cost = 0.001319
Epoch: 42000 cost = 0.001288
Epoch: 43000 cost = 0.001259
Epoch: 44000 cost = 0.001231
Epoch: 45000 cost = 0.001204
Epoch: 46000 cost = 0.001178
Epoch: 47000 cost = 0.001153
Epoch: 48000 cost = 0.001129
Epoch: 49000 cost = 0.001107
Epoch: 50000 cost = 0.001085
[ 2  0  2 10  6  7]
['狗', '蜘蛛', '狗', '好', '猫', '小明']