Tensorflow目标检测API-基于;CenterNet Resnet50 V1 FPN 512x512“;模型误差

Tensorflow目标检测API-基于;CenterNet Resnet50 V1 FPN 512x512“;模型误差,tensorflow,tensorflow2.0,object-detection,object-detection-api,Tensorflow,Tensorflow2.0,Object Detection,Object Detection Api,我正在尝试使用Tensorflow对象检测API,使用来自 我在Docker环境中运行Tensorflow,基于Tensorflow/Tensorflow:2.5.0-gpu-jupyter和最近的at提交检查eb6687ac 我已经设置了目录结构并下载了预先培训过的模型: mkdir -p /workspace/pre-trained-models/downloads/ && cd /workspace/pre-trained-models/downloads/ wget h

我正在尝试使用Tensorflow对象检测API,使用来自

我在Docker环境中运行Tensorflow,基于
Tensorflow/Tensorflow:2.5.0-gpu-jupyter
和最近的at提交检查
eb6687ac

我已经设置了目录结构并下载了预先培训过的模型:

mkdir -p /workspace/pre-trained-models/downloads/ && cd /workspace/pre-trained-models/downloads/

wget http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8.tar.gz

tar -zxvf centernet_resnet50_v1_fpn_512x512_coco17_tpu-8.tar.gz -C /workspace/pre-trained-models/

mkdir -p /workspace/models/my_centernet_resnet50_v1_fpn

cp /workspace/pre-trained-models/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8/pipeline.config /workspace/models/my_centernet_resnet50_v1_fpn/
我的
pipeline.config
如下所示:

请注意,我使用的是
use\u bfloat16:true
,因为我相信RTX 3090支持这一点。没有这一行,它也有同样的错误

# CenterNet meta-architecture from the "Objects as Points" [1] paper
# with the ResNet-v2-101 backbone. The ResNet backbone has a few differences
# as compared to the one mentioned in the paper, hence the performance is
# slightly worse. This config is TPU comptatible.
# [1]: https://arxiv.org/abs/1904.07850
#

model {
  center_net {
    num_classes: 1
    feature_extractor {
      type: "resnet_v1_50_fpn"
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 512
        max_dimension: 512
        pad_to_max_dimension: true
      }
    }
    object_detection_task {
      task_loss_weight: 1.0
      offset_loss_weight: 1.0
      scale_loss_weight: 0.1
      localization_loss {
        l1_localization_loss {
        }
      }
    }
    object_center_params {
      object_center_loss_weight: 1.0
      min_box_overlap_iou: 0.7
      max_box_predictions: 100
      classification_loss {
        penalty_reduced_logistic_focal_loss {
          alpha: 2.0
          beta: 4.0
        }
      }
    }
  }
}

train_config: {

  batch_size: 32
  num_steps: 250000

  data_augmentation_options {
    random_horizontal_flip {
    }
  }


  optimizer {
    adam_optimizer: {
      epsilon: 1e-7  # Match tf.keras.optimizers.Adam's default.
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: 1e-3
          total_steps: 250000
          warmup_learning_rate: 2.5e-4
          warmup_steps: 5000
        }
      }
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false

  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "/workspace/pre-trained-models/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8/checkpoint/ckpt-0"
  fine_tune_checkpoint_type: "detection"
  use_bfloat16: true
}

train_input_reader: {
  label_map_path: "/workspace/image-data/oli-fish/training_data/train.pbtxt"
  tf_record_input_reader {
    input_path: "/workspace/image-data/oli-fish/training_data/train.tfrecord"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1;
}

eval_input_reader: {
  label_map_path: "/workspace/image-data/oli-fish/test_data/test.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/workspace/image-data/oli-fish/test_data/test.tfrecord"
  }
}
我的培训数据由单个类组成

我使用以下命令运行培训:

python object_detection/model_main_tf2.py --model_dir=/workspace/models/my_centernet_resnet50_v1_fpn/ --pipeline_config_path=/workspace/models/my_centernet_resnet50_v1_fpn/pipeline.config
我得到以下错误:

/home/tensorflow/.local/lib/python3.6/site-packages/tensorflow_addons/utils/ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.3.0 and strictly below 2.5.0 (nightly versions are not supported).
 The versions of TensorFlow you are currently using is 2.5.0 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  UserWarning,
WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled.
W0517 15:55:17.669740 140455981164352 mirrored_strategy.py:379] Collective ops is not configured at program startup. Some performance features may not be enabled.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0517 15:55:17.837340 140455981164352 mirrored_strategy.py:369] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0517 15:55:17.839460 140455981164352 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0517 15:55:17.839519 140455981164352 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.870279 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.871679 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.873107 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.873559 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.876967 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.878880 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.891481 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.891972 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.892784 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.893235 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py:546: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0517 15:55:19.319097 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py:546: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
I0517 15:55:19.320800 140455981164352 dataset_builder.py:163] Reading unweighted datasets: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
I0517 15:55:19.320896 140455981164352 dataset_builder.py:80] Reading record datasets for input file: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0517 15:55:19.320939 140455981164352 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0517 15:55:19.320975 140455981164352 dataset_builder.py:88] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0517 15:55:19.322137 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in afuture version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0517 15:55:19.335709 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops)is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:206: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0517 15:55:24.661983 140455981164352 deprecation.py:336] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:206: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:464: to_float (fromtensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0517 15:55:26.951461 140455981164352 deprecation.py:336] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:464: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py:435: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
Traceback (most recent call last):
  File "object_detection/model_main_tf2.py", line 113, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "object_detection/model_main_tf2.py", line 110, in main
    record_summaries=FLAGS.record_summaries)
  File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 597, in train_loop
    train_input, unpad_groundtruth_tensors)
  File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 395, in load_fine_tune_checkpoint
    fine_tune_checkpoint_type=checkpoint_type)
  File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/meta_architectures/center_net_meta_arch.py", line 4155, in restore_from_objects
    supported_types))
ValueError: Checkpoint type "detection" not supported for CenterNetResnetV1FpnFeatureExtractor. Supported types are ['classification', 'fine_tune']
/home/tensorflow/.local/lib/python3.6/site-packages/tensorflow\u-addons/utils/sure\u-tf\u-install.py:67:UserWarning:tensorflow-addons支持对所有高于或等于2.3.0且严格低于2.5.0的tensorflow版本使用Python-ops(不支持夜间版本)。
您当前使用的TensorFlow版本为2.5.0,不受支持。
有些事情可能行得通,有些事情可能行不通。
如果遇到bug,不要提交问题。
如果要确保使用的是经过测试和支持的配置,请更改TensorFlow版本或TensorFlow插件的版本。
您可以在TensorFlow插件的自述文件中找到兼容性矩阵:
https://github.com/tensorflow/addons
用户警告,
警告:tensorflow:程序启动时未配置集体ops。某些性能功能可能未启用。
W0517 15:55:17.669740 140455981164352镜像_策略。py:379]在程序启动时未配置集体操作。某些性能功能可能未启用。
信息:tensorflow:对设备使用镜像策略('/job:localhost/replica:0/task:0/device:GPU:0',)
I0517 15:55:17.837340 140455981164352镜像_策略。py:369]对设备使用镜像策略('/job:localhost/replica:0/task:0/device:GPU:0',)
信息:tensorflow:可能覆盖列车步骤:无
I0517 15:55:17.839460 140455981164352 config_util.py:552]可能会覆盖列车步骤:无
信息:tensorflow:可能正在覆盖使用\u bfloat16:False
I0517 15:55:17.839519 140455981164352 config_util.py:552]可能会覆盖use_bfloat16:False
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.870279 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.871679 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.873107 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.873559 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.876967 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.878880 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.891481 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.891972 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.892784 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
信息:tensorflow:Reduce to/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
I0517 15:55:17.893235 140455981164352交叉设备操作。py:621]减少到/job:localhost/replica:0/task:0/device:CPU:0,然后广播到('/job:localhost/replica:0/task:0/device:CPU:0',)。
警告:tensorflow:From/home/tensorflow/.local/lib/python3.6/site packages/object\u detection/model\u lib\u v2.py:546:StrategyBase.experimental\u distribute\u datasets\u From\u函数(来自tensorflow.python.distribute.distribute\u lib)已被弃用,并将在未来版本中删除。
更新说明:
重命名为di