Tensorflow:加载未知的TFRecord数据集_Tensorflow_Tensorflow Datasets

Tensorflow:加载未知的TFRecord数据集

tensorflow

Tensorflow:加载未知的TFRecord数据集,tensorflow,tensorflow-datasets,Tensorflow,Tensorflow Datasets,我得到了一个TFRecord数据文件filename=train-00000-of-00001，其中包含未知大小的图像，可能还有其他信息。我知道我可以使用dataset=tf.data.TFRecordDataset（文件名）打开数据集如何从该文件中提取图像以将其保存为numpy数组我也不知道TFRecord文件中是否保存了任何其他信息，如标签或分辨率。我怎样才能得到这些信息？如何将它们保存为numpy数组我通常只使用numpy数组，不熟悉TFRecord数据文件。1。）如何从该文件提取图

我得到了一个TFRecord数据文件

filename=train-00000-of-00001

，其中包含未知大小的图像，可能还有其他信息。我知道我可以使用

dataset=tf.data.TFRecordDataset（文件名）

打开数据集

如何从该文件中提取图像以将其保存为numpy数组

我也不知道TFRecord文件中是否保存了任何其他信息，如标签或分辨率。我怎样才能得到这些信息？如何将它们保存为numpy数组

我通常只使用numpy数组，不熟悉TFRecord数据文件。

1。）如何从该文件提取图像以将其保存为numpy数组？

您需要的是：

record_iterator = tf.python_io.tf_record_iterator(path=filename)

for string_record in record_iterator:
  example = tf.train.Example()
  example.ParseFromString(string_record)

  print(example)

  # Exit after 1 iteration as this is purely demonstrative.
  break

2.）我如何获得这些信息？

这是官员。我强烈建议您阅读文档，因为它一步一步地介绍了如何提取您要查找的值

本质上，您必须将

示例

转换为字典。因此，如果我想找出tfrecord文件中有什么类型的信息，我会这样做（在第一个问题所述代码的上下文中）：

dict（example.features.feature.keys（）

3.）如何将它们保存为numpy数组？

我将以上面提到的for循环为基础。因此，对于每个循环，它都会提取您感兴趣的值，并将它们附加到numpy数组中。如果需要，可以从这些阵列创建一个数据帧，并将其保存为csv文件

但是…

您似乎有多个tfrecord文件。。。返回用于训练模型的数据集

因此，对于多个TFR记录，您将需要一个双for循环。外部循环将遍历每个文件。对于该特定文件，内部循环将遍历所有tf.examples

编辑：

import tensorflow as tf
from PIL import Image
import io
import numpy as np

# Load image
cat_in_snow  = tf.keras.utils.get_file(path, 'https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg')

#------------------------------------------------------Convert to tfrecords
def _bytes_feature(value):
  """Returns a bytes_list from a string / byte."""
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def image_example(image_string):
  feature = {
      'image_raw': _bytes_feature(image_string),
  }
  return tf.train.Example(features=tf.train.Features(feature=feature))

with tf.python_io.TFRecordWriter('images.tfrecords') as writer:
    image_string = open(cat_in_snow, 'rb').read()
    tf_example = image_example(image_string)
    writer.write(tf_example.SerializeToString())
#------------------------------------------------------


#------------------------------------------------------Begin Operation
record_iterator = tf.python_io.tf_record_iterator(path to tfrecord file)

for string_record in record_iterator:
  example = tf.train.Example()
  example.ParseFromString(string_record)

  print(example)

  # OPTION 1: convert bytes to arrays using PIL and IO
  example_bytes = dict(example.features.feature)['image_raw'].bytes_list.value[0]
  PIL_array = np.array(Image.open(io.BytesIO(example_bytes)))

  # OPTION 2: convert bytes to arrays using Tensorflow
  with tf.Session() as sess:
      TF_array = sess.run(tf.image.decode_jpeg(example_bytes, channels=3))

  break
#------------------------------------------------------


#------------------------------------------------------Compare results
(PIL_array.flatten() != TF_array.flatten()).sum()
PIL_array == TF_array

PIL_img = Image.fromarray(PIL_array, 'RGB')
PIL_img.save('PIL_IMAGE.jpg')

TF_img = Image.fromarray(TF_array, 'RGB')
TF_img.save('TF_IMAGE.jpg')
#------------------------------------------------------

转换为np.array（）

上述代码的来源：

转换
转换自

官方文件

编辑2:

import tensorflow as tf
from PIL import Image
import io
import numpy as np

# Load image
cat_in_snow  = tf.keras.utils.get_file(path, 'https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg')

#------------------------------------------------------Convert to tfrecords
def _bytes_feature(value):
  """Returns a bytes_list from a string / byte."""
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def image_example(image_string):
  feature = {
      'image_raw': _bytes_feature(image_string),
  }
  return tf.train.Example(features=tf.train.Features(feature=feature))

with tf.python_io.TFRecordWriter('images.tfrecords') as writer:
    image_string = open(cat_in_snow, 'rb').read()
    tf_example = image_example(image_string)
    writer.write(tf_example.SerializeToString())
#------------------------------------------------------


#------------------------------------------------------Begin Operation
record_iterator = tf.python_io.tf_record_iterator(path to tfrecord file)

for string_record in record_iterator:
  example = tf.train.Example()
  example.ParseFromString(string_record)

  print(example)

  # OPTION 1: convert bytes to arrays using PIL and IO
  example_bytes = dict(example.features.feature)['image_raw'].bytes_list.value[0]
  PIL_array = np.array(Image.open(io.BytesIO(example_bytes)))

  # OPTION 2: convert bytes to arrays using Tensorflow
  with tf.Session() as sess:
      TF_array = sess.run(tf.image.decode_jpeg(example_bytes, channels=3))

  break
#------------------------------------------------------


#------------------------------------------------------Compare results
(PIL_array.flatten() != TF_array.flatten()).sum()
PIL_array == TF_array

PIL_img = Image.fromarray(PIL_array, 'RGB')
PIL_img.save('PIL_IMAGE.jpg')

TF_img = Image.fromarray(TF_array, 'RGB')
TF_img.save('TF_IMAGE.jpg')
#------------------------------------------------------

请记住，tfrecords只是一种存储信息的方法，用于tensorflow模型以高效的方式读取
我使用PIL和IO将字节转换为图像。IO获取字节并将其转换为PIL.Image可以读取的格式
是的，有一种纯粹的tensorflow方法：
是的，在比较两个数组时，这两种方法之间存在差异
你应该选哪一个？如果您担心准确性，那么Tensorflow就不是一个好办法，如中所述：“Tensorflow为jpeg解码选择的默认设置是IFAST，以牺牲图像质量换取速度”。此信息的信用属于此

在您的帮助下，我现在可以看到每个示例都由几个

功能组成，这些功能都有一个键和一个值。这些键是标签
，图像/编码
，图像/宽度
和图像/高度
。我仍然不清楚如何读取这些信息，以及如何将编码图像转换为具有维度的矩阵WxHxC
。是的，这是一个字节列表。太棒了！现在我可以提取图像了。然而，我不明白为什么它会起作用。为什么我必须加载PIL
和io
？这些图书馆到底在做什么？有没有一种纯粹的tensorflow方法？我希望能够通过执行类似于image，label=sess.run（extract\u image，feed\u dict={path:filename}）的操作来提取图像。
编辑了我的答案我得到了：DecodeError:Error解析消息，使用以下记录：！wget-O“/tmp/train.tfrecords”“”