tensorflow如何知道将数据的哪一部分分配给哪个子数据集?
代码片段是从TensorFlow的教程网站()复制的。有两个代码块,一个用于tensorflow如何知道将数据的哪一部分分配给哪个子数据集?,tensorflow,keras,Tensorflow,Keras,代码片段是从TensorFlow的教程网站()复制的。有两个代码块,一个用于train\u ds,另一个用于val\u ds。除了subset=参数之外,它们是相同的。我想知道TensorFlow是否将前80%的数据分配给train\u ds,将其余数据分配给val\u ds。如果没有,TensorFlow如何知道哪个零件分配给哪个零件?谢谢 train_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir,
train\u ds
,另一个用于val\u ds
。除了subset=参数之外,它们是相同的。我想知道TensorFlow是否将前80%的数据分配给train\u ds
,将其余数据分配给val\u ds
。如果没有,TensorFlow如何知道哪个零件分配给哪个零件?谢谢
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2, #L: The same as above
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
我们可以看一下
它基本上归结为这个函数:
def get_training_or_validation_split(samples, labels, validation_split, subset):
"""Potentially restict samples & labels to a training or validation split.
Args:
samples: List of elements.
labels: List of corresponding labels.
validation_split: Float, fraction of data to reserve for validation.
subset: Subset of the data to return.
Either "training", "validation", or None. If None, we return all of the
data.
Returns:
tuple (samples, labels), potentially restricted to the specified subset.
"""
if not validation_split:
return samples, labels
num_val_samples = int(validation_split * len(samples))
if subset == 'training':
print('Using %d files for training.' % (len(samples) - num_val_samples,))
samples = samples[:-num_val_samples]
labels = labels[:-num_val_samples]
elif subset == 'validation':
print('Using %d files for validation.' % (num_val_samples,))
samples = samples[-num_val_samples:]
labels = labels[-num_val_samples:]
else:
raise ValueError('`subset` must be either "training" '
'or "validation", received: %s' % (subset,))
return samples, labels
训练集使用样本的第一部分,而验证集使用最后一部分