Python 在输入函数中调用序列的TensorFlow数据集方法_Python_Tensorflow

Python 在输入函数中调用序列的TensorFlow数据集方法

python tensorflow

Python 在输入函数中调用序列的TensorFlow数据集方法,python,tensorflow,Python,Tensorflow,在tf.data.Dataset中有很多方法，比如batch（）、shard（）、shuffle（）、prefetch（）、map（），等等。通常在实现输入时，我们会根据自己的意愿调用它们我想知道当我们以不同的顺序调用这些方法时，对程序是否有任何影响？例如，在以下两个调用序列中它们是否相同 dataset=dataset.shuffle（）.batch（） dataset=dataset.batch（）.shuffle（）我想知道当我们调用这些方法时，是否对程序有任何影响以不同的顺序

在tf.data.Dataset中有很多方法，比如batch（）、shard（）、shuffle（）、prefetch（）、map（），等等。通常在实现输入时，我们会根据自己的意愿调用它们

我想知道当我们以不同的顺序调用这些方法时，对程序是否有任何影响？例如，在以下两个调用序列中它们是否相同

dataset=dataset.shuffle（）.batch（）

dataset=dataset.batch（）.shuffle（）

我想知道当我们调用这些方法时，是否对程序有任何影响以不同的顺序

是的，有区别。几乎总是在

batch（）

之前调用

shuffle（）

，因为我们希望洗牌记录而不是批处理

tf.data.Dataset

的转换以调用它们的相同顺序应用

批处理将其输入的连续元素合并到输出中的单个批处理元素中

import tensorflow as tf
import numpy as np

dataset = tf.data.Dataset.from_tensor_slices(np.arange(19))
for batch in dataset.batch(5):
  print(batch)

输出：

tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor([5 6 7 8 9], shape=(5,), dtype=int64)
tf.Tensor([10 11 12 13 14], shape=(5,), dtype=int64)
tf.Tensor([15 16 17 18], shape=(4,), dtype=int64)

tf.Tensor([2 0 1 4 8], shape=(5,), dtype=int64)
tf.Tensor([ 9  3  7  6 11], shape=(5,), dtype=int64)
tf.Tensor([12 14 15  5 13], shape=(5,), dtype=int64)
tf.Tensor([17 18 16 10], shape=(4,), dtype=int64)

tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor([5 6 7 8 9], shape=(5,), dtype=int64)
tf.Tensor([15 16 17 18], shape=(4,), dtype=int64)
tf.Tensor([10 11 12 13 14], shape=(5,), dtype=int64)

当我们在将数据传送到网络之前洗牌数据时。这将使用

buffer\u size

元素填充缓冲区，然后从该缓冲区中随机采样元素，用新元素替换所选元素。为了实现完美的洗牌，缓冲区大小应该等于数据集的完整大小

for batch in dataset.shuffle(5).batch(5):
  print(batch)

输出：

tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor([5 6 7 8 9], shape=(5,), dtype=int64)
tf.Tensor([10 11 12 13 14], shape=(5,), dtype=int64)
tf.Tensor([15 16 17 18], shape=(4,), dtype=int64)

tf.Tensor([2 0 1 4 8], shape=(5,), dtype=int64)
tf.Tensor([ 9  3  7  6 11], shape=(5,), dtype=int64)
tf.Tensor([12 14 15  5 13], shape=(5,), dtype=int64)
tf.Tensor([17 18 16 10], shape=(4,), dtype=int64)

tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor([5 6 7 8 9], shape=(5,), dtype=int64)
tf.Tensor([15 16 17 18], shape=(4,), dtype=int64)
tf.Tensor([10 11 12 13 14], shape=(5,), dtype=int64)

你可以看到结果并不一致，但足够好

但是，如果您以不同的顺序应用这些方法，将得到意外的结果。它洗牌批次，而不是记录

for batch in dataset.batch(5).shuffle(5):
  print(batch)

输出：

tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor([5 6 7 8 9], shape=(5,), dtype=int64)
tf.Tensor([10 11 12 13 14], shape=(5,), dtype=int64)
tf.Tensor([15 16 17 18], shape=(4,), dtype=int64)

tf.Tensor([2 0 1 4 8], shape=(5,), dtype=int64)
tf.Tensor([ 9  3  7  6 11], shape=(5,), dtype=int64)
tf.Tensor([12 14 15  5 13], shape=(5,), dtype=int64)
tf.Tensor([17 18 16 10], shape=(4,), dtype=int64)

tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor([5 6 7 8 9], shape=(5,), dtype=int64)
tf.Tensor([15 16 17 18], shape=(4,), dtype=int64)
tf.Tensor([10 11 12 13 14], shape=(5,), dtype=int64)

可能与@zihaozhihao重复了是的，它是重复的，我很抱歉。此外，这也是一个好问题。是的，是的！谢谢分享：）