Python 如何从.tfrecords文件中选取TensorFlow中的特定记录？_Python_Python 2.7_Numpy_Machine Learning_Tensorflow

Python 如何从.tfrecords文件中选取TensorFlow中的特定记录？

python python-2.7 numpy machine-learning tensorflow

Python 如何从.tfrecords文件中选取TensorFlow中的特定记录？,python,python-2.7,numpy,machine-learning,tensorflow,Python,Python 2.7,Numpy,Machine Learning,Tensorflow,我的目标是为固定数量的纪元或步骤训练神经网络，我希望每个步骤都使用.tfrecords文件中特定大小的一批数据目前，我正在使用此循环读取文件： i = 0 data = np.empty(shape=[x,y]) for serialized_example in tf.python_io.tf_record_iterator(filename): example = tf.train.Example() example.ParseFromString(serialized

我的目标是为固定数量的纪元或步骤训练神经网络，我希望每个步骤都使用.tfrecords文件中特定大小的一批数据

目前，我正在使用此循环读取文件：

i = 0
data = np.empty(shape=[x,y])

for serialized_example in tf.python_io.tf_record_iterator(filename):

    example = tf.train.Example()
    example.ParseFromString(serialized_example)

    Labels = example.features.feature['Labels'].byte_list.value
    # Some more features here

    data[i-1] = [Labels[0], # more features here]

    if i == 3:
        break
    i = i + 1

print data # do some stuff etc.

我有点像Python的noob，我怀疑在循环之外创建“I”并在它达到某个值时爆发只是一个简单的词

是否有一种方法可以从文件中读取数据，但可以指定“我想要标签功能中包含的字节_列表中的前100个值”，然后指定“我想要下100个值”

澄清一下，我不熟悉的是以这种方式在文件上循环，我不确定如何操作循环

谢谢。

不可能。TFRecords是流式读卡器，没有随机访问

TFRecords文件表示（二进制）字符串序列。该格式不是随机访问，因此适合于流式传输大量数据，但如果需要快速分片或其他非顺序访问，则不适合

扩展评论（尽管这不是解决您问题的理想方案），以便于存档。如果要使用

enumerate（）

在某个迭代中从循环中中断，可以执行以下操作：

n = 5 # Iteration you would like to stop at
data = np.empty(shape=[x,y])

for i, serialized_example in enumerate(tf.python_io.tf_record_iterator(filename)):

    example = tf.train.Example()
    example.ParseFromString(serialized_example)

    Labels = example.features.feature['Labels'].byte_list.value
    # Some more features here

    data[i-1] = [Labels[0], Labels[1]]# more features here

    if i == n:
       break

print(data)

解决

.tfrecords

我希望每个步骤都使用.tfrecords文件中特定大小的一批数据

如所述，.tf记录不适用于任意访问数据。但是，由于您只需要不断地从

.tfrecords

文件中提取批处理，因此最好使用

tf.data

API来为您的模型提供数据

改编自：

从

.tfrecord

文件构建

数据集
filepath1='/path/to/file.tfrecord'
filepath2='/path/to/other_file.tfrecord
dataset=tf.data.TFRecordDataset（文件名=[filepath1，filepath2]）

从这里开始，如果您正在使用tf.keras API，您可以将dataset
作为参数传递到model.fit
中，如下所示：
model.fit（x=dataset，
批次大小=无，
验证数据=某些数据集（其他数据集）

额外的东西
这里有一个例子可以帮助您更好地解释.tfrecord
文件，它比tensorflow文档要好一点。
您可以尝试使用枚举（）
：谢谢您的回答。我想知道你能否给我指个地方，让我进一步了解这种格式。大多数情况下，我只是不确定它有什么用处。这是不是仅仅从可用的API中不可能实现的？也许有一个较低级别的解决方案？