Python TensorFlow LSTM对MNIST进行分类-了解时间步数的变化
上下文 我试图理解RNN,包括实现(在TensorFlow中)和理论。作为本文的一部分,我编写了一个简单的LSTM,使用TensorFlow对MNIST手写数字进行分类 为此,我使用了TensorFlow的Python TensorFlow LSTM对MNIST进行分类-了解时间步数的变化,python,tensorflow,lstm,recurrent-neural-network,mnist,Python,Tensorflow,Lstm,Recurrent Neural Network,Mnist,上下文 我试图理解RNN,包括实现(在TensorFlow中)和理论。作为本文的一部分,我编写了一个简单的LSTM,使用TensorFlow对MNIST手写数字进行分类 为此,我使用了TensorFlow的dynamic\u rnn,输入形状为[批大小、最大时间步长、输入数量](以及时间\u major=False) 当我以28个时间步将MNIST图像输入模型,每个时间步输入28个像素(总共784个像素)时,模型运行良好,训练速度快,精度高(约1分钟/历元,98%精度,128个隐藏单位) 但是,
dynamic\u rnn
,输入形状为[批大小、最大时间步长、输入数量]
(以及时间\u major=False
)
当我以28个时间步将MNIST图像输入模型,每个时间步输入28个像素(总共784个像素)时,模型运行良好,训练速度快,精度高(约1分钟/历元,98%精度,128个隐藏单位)
但是,如果我将图像逐像素输入到模型中,因此每个输入大小为1的时间步有784个,那么模型的性能非常差(约30分钟/历元,最大精度为40%)
问题:
- 这到底是怎么回事?为什么逐像素馈送图像会导致模型性能如此差?这是因为没有足够的上下文或时间步太多,还是模型一定有问题
- 有没有办法改变这一点,使模型在逐像素输入图像时能够正常工作?我听说过在一定的时间步长上截断反向传播,但是TensorFlow文档不清楚如何准确地做到这一点,我还没有找到任何合适的指南
代码,如果有用的话:
# ================ IMPORTS ================
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
# =========================================
# ================ CONFIG VARIABLES ================
# Have tried learning rates 1, 0.5, 0.01, 0.005, 0.0001, 0.00001 as well, this one
# was best so far
LEARNING_RATE = 0.001
BATCH_SIZE = 50
NUM_EPOCHS = 20
# Setting input and steps to these values is fast and accurate
#NUM_INPUT = 28
#NUM_STEPS = 28
# Setting input and steps to these is very slow and inaccurate
NUM_INPUT = 1
NUM_STEPS = 784
# Number of hidden layer features in the LSTM
NUM_HIDDEN = 128
FC_LAYER_UNITS = 100
# Number of classes to classify into
NUM_CLASSES = 10
# Defines how often the network's accuracy is printed to show the user
DISPLAY_EVERY = 50
# Early stopping threshold. The early stopping mechanism works by saving the model every time its accuracy on the test
# set is higher than any previous accuracies. If more than this number of steps have passed since the model was last
# improved, the model is deemed to be unable to achieve a higher accuracy and training is stopped.
MAX_STEPS_SINCE_SAVE = 10
# ==================================================
# ================ FUNCTIONS ================
def binarize(images, threshold=0.1):
"""
Changes MNIST images into binary versions of themselves, where each pixel is either a 1 or a 0
:param images: the images as flat 1D arrays to turn into
:param threshold: the required value for each value to be classified as a 1
:return: the binarized image
"""
return (threshold < images).astype("float32")
def weight_variable(shape):
"""
A function to create TensorFlow weight variables.
:param shape: the dimensions of the variable to be created
:return: a TensorFlow weight variable ready for training
"""
variable = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(variable)
def bias_variable(shape):
"""
A function to create a TensorFlow bias variable.
:param shape: the dimensions of the variable to be created
:return: a TensorFlow bias variable ready for training
"""
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
# ===========================================
# ================ MAIN ================
# ==== Graph Definition ====
# Read the MNIST data
mnist = input_data.read_data_sets("MNIST_Data", one_hot=True)
# Input to the LSTM
inputs = tf.placeholder(tf.float32, [None, NUM_STEPS, NUM_INPUT])
labels = tf.placeholder(tf.float32, [None, NUM_CLASSES])
seqlens = tf.placeholder(tf.int32, [None])
keep_prob = tf.placeholder(tf.float32)
# Define the LSTM cell, dropout wrapper and the dynamic rnn
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(NUM_HIDDEN, forget_bias=1.0)
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(cell=lstm_cell, output_keep_prob=keep_prob)
outputs, states = tf.nn.dynamic_rnn(lstm_cell, inputs, dtype=tf.float32, sequence_length=seqlens)
# Get the final output
outputs = tf.transpose(outputs, [1, 0, 2])
last_rnn_output = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
# Define a weight and bias variable
W = weight_variable([NUM_HIDDEN, FC_LAYER_UNITS])
b = bias_variable([FC_LAYER_UNITS])
# Linear layer for LSTM output
lstm_out = tf.matmul(last_rnn_output, W) + b
# Add a ReLU layer
activations = tf.nn.relu(lstm_out)
# Add a final affine transformation
W2 = weight_variable([FC_LAYER_UNITS, NUM_CLASSES])
b2 = bias_variable([NUM_CLASSES])
pred = tf.matmul(activations, W2) + b2
# Now we need a loss function and optimizer
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=labels))
opt = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(loss)
# Model evaluation
correct_predictions = tf.equal(tf.argmax(pred, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))
# A global variable initializer
init = tf.global_variables_initializer()
# ==== Training ====
saver = tf.train.Saver()
with tf.Session() as sess:
# Keeps track of the number of steps since the model last achieved a winning accuracy. If this is greater than a
# threshold, then the model is deemed to have achieved the highest possible accuracy and training is stopped.
steps_since_save = 0
# Keeps track of the highest accuracy yet achieved by the model.
highest_accuracy = 0
# Initialise variables
sess.run(init)
# Calculate how many batches we have
total_batches = int(mnist.train.num_examples / BATCH_SIZE)
for epoch in range(NUM_EPOCHS):
for batch in range(total_batches):
# Get a batch of data
batch_data, batch_labels = mnist.train.next_batch(BATCH_SIZE)
batch_data = binarize(batch_data)
# Reshape the training data into a NUM_STEPS x NUM_INPUT image rather than a flat array
batch_data = batch_data.reshape((BATCH_SIZE, NUM_STEPS, NUM_INPUT))
seq_lens = [NUM_STEPS] * BATCH_SIZE
# Run optimization
sess.run(opt, feed_dict={inputs: batch_data, labels: batch_labels, seqlens: seq_lens, keep_prob: 0.5})
if batch % DISPLAY_EVERY == 0:
num_test = 1000
# Test images
test_data = binarize(mnist.test.images[0:num_test])
# Reshape test images
test_data = test_data.reshape((-1, NUM_STEPS, NUM_INPUT))
seq_lens = [NUM_STEPS] * num_test
# Run accuracy and loss
test_acc, test_loss = sess.run([accuracy, loss], feed_dict={inputs: test_data, labels: mnist.test.labels[0:num_test], seqlens: seq_lens, keep_prob: 1})
# Display the information
print("\n\t\t-->> EPOCH ", epoch, ", BATCH ", batch, " <<--\n")
print("--> Number of Hidden Units: ", NUM_HIDDEN)
print("Accuracy: ", test_acc, ", Loss: ", test_loss)
# Update the highest accuracy and save if we beat the previous highest accuracy.
if test_acc > highest_accuracy:
print(">> New Highest Accuracy, Saving Model <<")
#saver.save(sess, SAVE_PATH)
print(">> Model Saved <<")
highest_accuracy = test_acc
steps_since_save = 0
else:
steps_since_save += 1
# Model has fully trained, stop training
if steps_since_save > MAX_STEPS_SINCE_SAVE:
print("\n\n**** MODEL CONVERGED, STOPPING EARLY ****")
break
# ======================================
导入================
导入tensorflow作为tf
从tensorflow.examples.tutorials.mnist导入输入数据
将numpy作为np导入
# =========================================
#======================配置变量================
#我也试过学习率1,0.5,0.01,0.005,0.0001,0.00001,这一个
#是目前为止最好的
学习率=0.001
批量大小=50
NUM_EPOCHS=20
#将输入和步骤设置为这些值既快速又准确
#NUM_输入=28
#步骤数=28
#将输入和步骤设置为这些是非常缓慢和不准确的
NUM_输入=1
步骤数=784
#LSTM中隐藏图层要素的数量
NUM_HIDDEN=128
FC_层_单位=100
#要分类到的类数
NUM_类=10
#定义打印网络精度以向用户显示的频率
每显示一次=50
#早期停止阈值。早期停止机制的工作原理是每次在测试中保持模型的准确性时都保存模型
#设置的精度高于以前的任何精度。如果自上次创建模型以来已通过的步骤超过此数量
#改进后,该模型被视为无法达到更高的精度,并停止训练。
自保存以来的最大步数=10
# ==================================================
#===================功能================
def二值化(图像,阈值=0.1):
"""
将MNIST图像更改为其自身的二进制版本,其中每个像素为1或0
:param images:将图像转换为平面1D阵列
:param threshold:要分类为1的每个值所需的值
:return:二值化图像
"""
返回(阈值<图像).astype(“float32”)
def重量_变量(形状):
"""
创建TensorFlow权重变量的函数。
:param shape:要创建的变量的尺寸
:return:准备好训练的TensorFlow重量变量
"""
变量=tf.截断的_法线(形状,stddev=0.1)
返回tf.Variable(变量)
def偏差_变量(形状):
"""
创建TensorFlow偏差变量的函数。
:param shape:要创建的变量的尺寸
:return:准备好进行训练的TensorFlow偏差变量
"""
初始=tf.常数(0.1,形状=形状)
返回tf.变量(初始值)
# ===========================================
#======================主要================
#==图形定义====
#读取MNIST数据
mnist=输入数据。读取数据集(“mnist\u数据”,one\u hot=真)
#输入到LSTM
inputs=tf.placeholder(tf.float32,[None,NUM\u步骤,NUM\u输入])
labels=tf.placeholder(tf.float32,[None,NUM\u类])
seqlens=tf.placeholder(tf.int32,[None])
keep_prob=tf.placeholder(tf.float32)
#定义LSTM单元、退出包装器和动态rnn
lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(NUM_HIDDEN,forget_bias=1.0)
lstm_单元=tf.nn.rnn_单元.dropoutrapper(单元=lstm_单元,输出保持保持保持保持保持)
输出,状态=tf.nn.dynamic\n(lstm\u单元,输入,数据类型=tf.float32,序列长度=seqlens)
#获取最终输出
输出=tf.transpose(输出[1,0,2])
last_rnn_output=tf.gather(outputs,int(outputs.get_shape()[0])-1)
#定义权重和偏差变量
W=权重变量([NUM\u HIDDEN,FC\u LAYER\u UNITS])
b=偏差变量([FC\u层\u单位])
#用于LSTM输出的线性层
lstm\u out=tf.matmul(最后一次输出,W)+b
#添加一个ReLU层
激活=tf.nn.relu(lstm_out)
#添加最终仿射变换
W2=权重变量([FC\u层\u单位,NUM\u类])
b2=偏差变量([NUM\u类])
pred=tf.matmul(激活,W2)+b2
#现在我们需要一个损失函数和优化器
损失=tf.reduce_平均值(tf.nn.softmax_交叉_熵_与_logits(logits=pred,labels=labels))
opt=tf.train.AdamOptimizer(学习率=学习率)。最小化(损失)
#模型评估
正确的预测=tf.equal(tf.argmax(pred,1),tf.argmax(labels,1))
准确度=tf.reduce_平均值(tf.cast(正确的预测,tf.float32))
#全局变量初始值设定项
init=tf.global_variables_initializer()
#==培训====
saver=tf.train.saver()
使用tf.Session()作为sess:
#跟踪自模型上次达到成功精度以来的步数。如果这大于
#阈值,则认为模型已达到尽可能高的精度,并停止训练。
步骤\u自\u保存=0
#跟踪模型迄今为止达到的最高精度。
最高精度=