Keras YOLO3的解码输出_Keras_Deep Learning_Neural Network_Computer Vision_Yolo

Keras YOLO3的解码输出

keras deep-learning neural-network computer-vision

Keras YOLO3的解码输出,keras,deep-learning,neural-network,computer-vision,yolo,Keras,Deep Learning,Neural Network,Computer Vision,Yolo,这可能是Stackoverflow中提出的一个有点非正统的问题我正在学习YOLO3算法，它的实现如下。在我的本地机器上，YOLO3的这种实现工作得很好，可以准确地检测对象。此实现中几乎没有方法。除此之外，大多数我都能理解：decode\u netout（）以下是完整的方法： def decode_netout(netout, anchors, obj_thresh, net_h, net_w): grid_h, grid_w = netout.shape[:2] nb_box

这可能是Stackoverflow中提出的一个有点非正统的问题

我正在学习YOLO3算法，它的实现如下。在我的本地机器上，YOLO3的这种实现工作得很好，可以准确地检测对象。此实现中几乎没有方法。除此之外，大多数我都能理解：decode\u netout（）

以下是完整的方法：

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
    grid_h, grid_w = netout.shape[:2]
    nb_box = 3
    netout = netout.reshape((grid_h, grid_w, nb_box, -1))
    nb_class = netout.shape[-1] - 5
    boxes = []
    netout[..., :2]  = _sigmoid(netout[..., :2])
    netout[..., 4:]  = _sigmoid(netout[..., 4:])
    netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
    netout[..., 5:] *= netout[..., 5:] > obj_thresh
 
    for i in range(grid_h*grid_w):
        row = i / grid_w
        col = i % grid_w
        for b in range(nb_box):
            # 4th element is objectness score
            objectness = netout[int(row)][int(col)][b][4]
            if(objectness.all() <= obj_thresh): continue
            # first 4 elements are x, y, w, and h
            x, y, w, h = netout[int(row)][int(col)][b][:4]
            x = (col + x) / grid_w # center position, unit: image width
            y = (row + y) / grid_h # center position, unit: image height
            w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
            h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
            # last elements are class probabilities
            classes = netout[int(row)][col][b][5:]
            box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
            boxes.append(box)
    return boxes

它通过一个sigmoid函数传递输出，该函数将输出压缩在0到1的范围内，有效地将中心保持在正在预测的网格中。我还认为我或多或少地理解了代码的其余部分。但有一句话我完全不懂。电话是：

 netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]

为什么这里的

netout[…，5://code>与netout[…，4]
相乘并分配给结果
我发现了一个类似的问题。但是在答案中，有人讨论了超速，但没有讨论方法decode\u netout（）

这就是为什么我在这里要求，如果有人可以走我扔这个方法，这里发生了什么，特别是在这一行
netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]

netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]