如何在tensorflow中的tf.dataset.map中捕获并引发tfrecord错误<;2
使用tfdataset(tensorflow<2.0),我遇到了一个罕见的损坏文件,无法转换为正确的维度。每个tfrecord行都有要读取的图像的文件名和尺寸。我想捕获错误并打印此文件名,以便将其删除 在哪里放置解析器的try-catch以提升文件名如何在tensorflow中的tf.dataset.map中捕获并引发tfrecord错误<;2,tensorflow,tfrecord,Tensorflow,Tfrecord,使用tfdataset(tensorflow
def _parse_fn(example):
# Define features
features = {
'image/filename': tf.io.FixedLenFeature([], tf.string),
"image/height": tf.FixedLenFeature([], tf.int64),
"image/width": tf.FixedLenFeature([], tf.int64),
}
# Load one example and parse
example = tf.io.parse_single_example(example, features)
# Load image from file
filename = tf.cast(example["image/filename"], tf.string)
loaded_image = tf.read_file(filename)
loaded_image = tf.image.decode_image(loaded_image, 3)
# Reshape to known shape
image_rows = tf.cast(example['image/height'], tf.int32)
image_cols = tf.cast(example['image/width'], tf.int32)
#Wrap in a try catch and report file failure
try:
loaded_image = tf.reshape(loaded_image,
tf.stack([image_rows, image_cols, 3]),
name="cast_loaded_image")
except tf.errors.InvalidArgumentError as e:
print("Image filename: {} yielded {}".format(filename, e))
未捕获错误且未打印文件名
File "/apps/tensorflow/1.14.0.cuda10.gpu/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError:
2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 480000 values, but the requested shape has 259518
[[{{node cast_loaded_image}}]]
[[IteratorGetNext]]
[[replica_3/retinanet/bn2c_branch2a/FusedBatchNorm/ReadVariableOp/_987]]
(1) Invalid argument: Input to reshape is a tensor with 480000 values, but the requested shape has 259518
[[{{node cast_loaded_image}}]]
[[IteratorGetNext]]
0 successful operations.
3 derived errors ignored.
一位同事建议:如果没有其他答案,我会接受,我认为这有一定的价值 一个策略是使用
dataset = dataset.apply(tf.data.experimental.ignore_errors())
并应用返回每个tfrecord的文件名的解析。运行此命令并将其和原始记录的长度进行比较后,您可以找到哪些图像已损坏。我认为肯定还有其他解决方案,但如果您可以访问打包到tfrecords中的原始图像集,则可以区分列表并获取缺少的文件名
dataset = dataset.apply(tf.data.experimental.ignore_errors())