Tensorflow 将预训练模型生成的预测输出解码为人类可读的标签_Tensorflow_Machine Learning_Deep Learning_Object Detection_Pre Trained Model

Tensorflow 将预训练模型生成的预测输出解码为人类可读的标签

tensorflow machine-learning deep-learning

Tensorflow 将预训练模型生成的预测输出解码为人类可读的标签,tensorflow,machine-learning,deep-learning,object-detection,pre-trained-model,Tensorflow,Machine Learning,Deep Learning,Object Detection,Pre Trained Model,我试图使用一个预先训练过的物体检测模型。基本上，我选择了在Open Images数据集上训练的更快\u rcnn\u inception\u resnet\u v2\u atrus\u oidv4 这是我的密码： import tensorflow as tf # restore the deep model sess=tf.Session() #First let's load meta graph and restore weights saver = tf.train.import_me

我试图使用一个预先训练过的物体检测模型。基本上，我选择了在Open Images数据集上训练的

更快\u rcnn\u inception\u resnet\u v2\u atrus\u oidv4

这是我的密码：

import tensorflow as tf

# restore the deep model
sess=tf.Session()
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('pretrained/faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12/model.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint('pretrained/faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12/'))

# Now, let's access and create placeholders variables and
# create feed-dict to feed new data
graph = tf.get_default_graph()
X = graph.get_tensor_by_name('image_tensor:0')
feed_dict ={X: image_raw_feature}

#Now, access the op that we want to run. 
num_detections = graph.get_tensor_by_name('num_detections:0')
detection_scores = graph.get_tensor_by_name('detection_scores:0')
detection_boxes = graph.get_tensor_by_name('detection_boxes:0')
 
x1, x2, x3 = sess.run(
    [num_detections, detection_scores, detection_boxes],
    feed_dict
)

输出

x1、x2、x3

具有

、

[4100]

和

[4100,4]

的形状。问题是我不知道如何将结果解码为人类可读的标签。我猜对象类别的总数是100，如

x2

所示？但与数据集中描述的内容相比，它似乎非常小

如何将输出解码为标签？

如中所述，输出张量应具有以下形状：

detection_boxes: [batch, max_detection, 4]
detection_scores: [batch, max_detections]
detection_classes: [batch, max_detections]
num_detections: [batch]

在这里，

bacth=4

，

max\u detections=100

和它包含具有不同置信度分数的所有检测，因此您可能需要确定分数阈值，以过滤掉具有低置信度分数的检测。此外，

detection\u框

包含按

ymin，xmin，ymax，xmax

在标准化坐标中，需要获得图像的形状才能获得绝对坐标

例如，假设您希望所有检测的分数均为

0.5

：

final_boxes = []
for i in range(int(num_detections)):
    final_boxes.append(detection_boxes[i, detection_scores[i]>0.5, ])

这将为您提供置信度分数高于0.5的检测。

事实上，我使用了4幅图像作为输入。

最大检测次数的含义是什么？为什么它等于100。通常，我认为应该是班级（类别）的数量。例如，ImageNet中的类数是1000，因此检测得分应该是[batch\u size，1000]
，不是吗？由于模型是使用Open Images数据集进行训练的，其中包含大约600
对象类别，我不知道为什么我在这里得到100
。max_detections
是单个图像上可以检测到的最大对象数，如果您的图像要检测的对象超过100个，然后只会检测到100个<代码>最大检测次数
与类别数量无关。类的数量为1000，检测分数仍处于形状[批量大小，最大检测]（例如[4100]）。1000表示类值范围从1到1000，例如，检测类
可以是[1,2,52,6732,7,7，…，3287832322231]，但是检测类
的长度是100。这对我来说很有意义。如何从类索引（例如1、2、52）映射到相应的标签？Tensorflow中是否有映射标签的函数，或者我必须创建一个？例如，在Keras中，每个预训练模型都有一个名为

decode\u predictions的函数，因此我只需要调用该函数将输出概率转换为标签。我之所以这样做是因为对于一个数据集（例如打开的图像），有不同的版本，不同版本中的类数也会发生变化。因此，如果映射函数不可用，就很难获得正确的标签（他们没有提到用于培训的数据集的版本）。有一个

label.pbtxt

文件将类号映射到类名。因此，在训练期间，类名将映射到类号，在推理期间，类号将映射回类名。通常，此映射特定于数据（或版本）。因此，如果您使用的是预训练网络，则必须查找并使用特定的相应

label.pbtxt

文件。下面是一些示例

label.pbtxt

文件的链接。