Input 批处理和洗牌填充tf.train.SequenceExample
我有一些序列到序列场景的训练示例,它们存储为Input 批处理和洗牌填充tf.train.SequenceExample,input,tensorflow,protocol-buffers,padding,pipeline,Input,Tensorflow,Protocol Buffers,Padding,Pipeline,我有一些序列到序列场景的训练示例,它们存储为tf.train.SequenceExample在一个(或多个)文件中,该文件是TFRecordWriter编写的。我想阅读、解码它们,并将它们随机分批地输入我的网络。我一直在努力寻找文档和一些教程,但我无法从这些东西中得到任何东西。我正在编写一个独立的示例,如下所示 import random import tensorflow as tf from six.moves import xrange MIN_LEN = 6 MAX_LEN =
tf.train.SequenceExample
在一个(或多个)文件中,该文件是TFRecordWriter
编写的。我想阅读、解码它们,并将它们随机分批地输入我的网络。我一直在努力寻找文档和一些教程,但我无法从这些东西中得到任何东西。我正在编写一个独立的示例,如下所示
import random
import tensorflow as tf
from six.moves import xrange
MIN_LEN = 6
MAX_LEN = 12
NUM_EXAMPLES = 20
BATCH_SIZE = 3
PATH = 'ciaone.tfrecords'
MIN_AFTER_DEQUEUE = 10
NUM_THREADS = 2
SAFETY_MARGIN = 1
CAPACITY = MIN_AFTER_DEQUEUE + (NUM_THREADS + SAFETY_MARGIN) * BATCH_SIZE
def generate_example():
# fake examples which are just useful to have a quick visualization.
# The input is a sequence of random numbers.
# The output is a sequence made of those numbers from the
# input sequence which are greater or equal then the average.
length = random.randint(MIN_LEN, MAX_LEN)
input_ = [random.randint(0, 10) for _ in xrange(length)]
avg = sum([1.0 * item for item in input_]) / len(input_)
output = [item for item in input_ if item >= avg]
return input_, output
def encode(input_, output):
length = len(input_)
example = tf.train.SequenceExample(
context=tf.train.Features(
feature={
'length': tf.train.Feature(
int64_list=tf.train.Int64List(value=[length]))
}),
feature_lists=tf.train.FeatureLists(
feature_list={
'input': tf.train.FeatureList(
feature=[
tf.train.Feature(
int64_list=tf.train.Int64List(value=[item]))
for item in input_]),
'output': tf.train.FeatureList(
feature=[
tf.train.Feature(
int64_list=tf.train.Int64List(value=[item]))
for item in output])
}
)
)
return example
def decode(example):
context_features = {
'length': tf.FixedLenFeature([], tf.int64)
}
sequence_features = {
'input': tf.FixedLenSequenceFeature([], tf.int64),
'output': tf.FixedLenSequenceFeature([], tf.int64)
}
ctx, seq = tf.parse_single_sequence_example(
example, context_features, sequence_features)
input_ = seq['input']
output = seq['output']
return input_, output
if __name__ == '__main__':
# STEP 1. -- generate a dataset.
with tf.python_io.TFRecordWriter(PATH) as writer:
for _ in xrange(NUM_EXAMPLES):
record = encode(*generate_example())
writer.write(record.SerializeToString())
with tf.Session() as sess:
queue = tf.train.string_input_producer([PATH])
reader = tf.TFRecordReader()
_, value = reader.read(queue)
input_, output = decode(value)
# HERE I AM STUCK!
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
try:
while True:
# do something...
except tf.errors.OutOfRangeError, e:
coord.request_stop(e)
finally:
coord.request_stop()
coord.join(threads)
coord.request_stop()
coord.join(threads)
有人能建议我如何进行吗?
提前谢谢
附带要求:任何关于资源的指针都可以帮助您更好地理解TensorFlow的输入管道API。如果您处理的是
示例
S,而不是SequenceExample
S,则只需在解码的张量上添加一个调用即可
_, value = reader.read(queue)
input_, output = decode(value)
batch_input, batch_output = tf.train.shuffle_batch([input_, output],
batch_size=BATCH_SIZE, capacity=CAPACITY,
min_after_sequeue=MIN_AFTER_DEQUEUE)
然而,shuffle批处理要求传递的张量具有静态形状,这在这里是不正确的。对于可变形状张量,可以改为与dynamic\u pad=True
一起使用。这将为您处理批处理(和填充),但不会洗牌您的示例。不幸的是,shuffle\u batch
没有使用dynamic\u pad
参数
有一种解决方法,您可以在调用tf.train.batch
之前添加randomsufflequeue
:
inputs = decode(value)
dtypes = list(map(lambda x: x.dtype, inputs))
shapes = list(map(lambda x: x.get_shape(), inputs))
queue = tf.RandomShuffleQueue(CAPACITY, MIN_AFTER_DEQUEUE, dtypes)
enqueue_op = queue.enqueue(inputs)
qr = tf.train.QueueRunner(queue, [enqueue_op] * NUM_THREADS)
tf.add_to_collection(tf.GraphKeys.QUEUE_RUNNERS, qr)
inputs = queue.dequeue()
for tensor, shape in zip(inputs, shapes):
tensor.set_shape(shape)
# Now you can use tf.train.batch with dynamic_pad=True, and the order in which
# it enqueues elements will be permuted because of RandomShuffleQueue.
batch_input, batch_output = tf.train.batch(inputs, batch_size, capacity=capacity,
dynamic_pad=True, name=name)
有一个实现这种模式的例子(在谷歌的品红项目中) 这正是我在
示例
s中所做的,但我仍然需要找出如何处理序列示例
。感谢您指出github问题!