Python 读取tfrecord:DecodeError:解析消息时出错
我正在使用colab对tensorflow进行排名。它使用wget获取tfrecord:Python 读取tfrecord:DecodeError:解析消息时出错,python,tensorflow,tensorflow-datasets,Python,Tensorflow,Tensorflow Datasets,我正在使用colab对tensorflow进行排名。它使用wget获取tfrecord: !wget -O "/tmp/train.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/ELWC/train.tfrecords" for example in tf.compat.v1.python_io.tf_record_iterator("/tmp/train.tfrecords"): print(tf.
!wget -O "/tmp/train.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/ELWC/train.tfrecords"
for example in tf.compat.v1.python_io.tf_record_iterator("/tmp/train.tfrecords"):
print(tf.train.Example.FromString(example))
break
我使用此代码尝试查看tfrecord的结构:
!wget -O "/tmp/train.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/ELWC/train.tfrecords"
for example in tf.compat.v1.python_io.tf_record_iterator("/tmp/train.tfrecords"):
print(tf.train.Example.FromString(example))
break
我得到:
DecodeError: Error parsing message
通常如何看待TFR记录的结构
第二个问题:在哪里可以找到关于类的文档,如
tf.train.Example
?我发现了这一点。问题的关键在于,使用另一种模式对记录进行序列化:使用ExampleListWithContext
模式,而不是基本的tf.train.Example
模式。更新正确的反序列化解决了这个问题
filenames = ['/tmp/train.tfrecords']
raw_dataset = tf.data.TFRecordDataset(filenames)
for e in raw_dataset.take(1):
ELWC = input_pb2.ExampleListWithContext()
v = ELWC.FromString(e.numpy())
print(v.context)
for e in v.examples:
print(e)
产出:
features {
feature {
key: "query"
value {
bytes_list {
value: "why do ..."
}
}
}
feature {
key: "query_bert_encoder_outputs"
value {
float_list {
...
}}