Machine learning Tensorflow：在TFR记录中分离培训和评估数据_Machine Learning_Tensorflow_Training Data

Machine learning Tensorflow：在TFR记录中分离培训和评估数据

machine-learning tensorflow

Machine learning Tensorflow：在TFR记录中分离培训和评估数据,machine-learning,tensorflow,training-data,Machine Learning,Tensorflow,Training Data,我有一个.tfrecords文件，文件中填充了带标签的数据。我想使用其中的X%进行培训，使用（1-X%）进行评估/测试。显然，不应该有任何重叠。这样做的最佳方式是什么下面是我用来阅读tfrecords的一小段代码。有什么方法可以让我将数据拆分为培训和评估数据？我做得不对吗 reader = tf.TFRecordReader() files = tf.train.string_input_producer([TFRECORDS_FILE], num_epochs=num_epochs) re

我有一个

.tfrecords

文件，文件中填充了带标签的数据。我想使用其中的X%进行培训，使用（1-X%）进行评估/测试。显然，不应该有任何重叠。这样做的最佳方式是什么

下面是我用来阅读

tfrecords

的一小段代码。有什么方法可以让我将数据拆分为培训和评估数据？我做得不对吗

reader = tf.TFRecordReader()
files = tf.train.string_input_producer([TFRECORDS_FILE], num_epochs=num_epochs)

read_name, serialized_examples = reader.read(files)
features = tf.parse_single_example(
  serialized = serialized_examples,
  features={
      'image': tf.FixedLenFeature([], tf.string),
      'value': tf.FixedLenFeature([], tf.string),
  })
image = tf.decode_raw(features['image'], tf.uint8)
value = tf.decode_raw(features['value'], tf.uint8)

image, value = tf.train.shuffle_batch([image, value],
 enqueue_many = False,
 batch_size = 4,
 capacity  = 30,
 num_threads = 3,
 min_after_dequeue = 10)

虽然这个问题一年多前就被问到了，但我最近也有一个类似的问题

我使用了tf.data.Dataset和输入哈希上的过滤器。以下是一个示例：

dataset = tf.data.TFRecordDataset(files)

if is_evaluation:
  dataset = dataset.filter(
    lambda r: tf.string_to_hash_bucket_fast(r, 10) == 0)
else:
  dataset = dataset.filter(
    lambda r: tf.string_to_hash_bucket_fast(r, 10) != 0)

dataset = dataset.map(tf.parse_single_example)

return dataset

到目前为止，我注意到的一个缺点是，每次评估可能需要10倍的数据遍历来收集足够的数据。为了避免这种情况，您可能希望在数据预处理时分离数据