Computer vision 根据图像中出现的顺序对检测到的文本边框坐标进行排序_Computer Vision_Ocr_Bounding Box_Text Recognition

Computer vision 根据图像中出现的顺序对检测到的文本边框坐标进行排序

computer-vision

Computer vision 根据图像中出现的顺序对检测到的文本边框坐标进行排序,computer-vision,ocr,bounding-box,text-recognition,Computer Vision,Ocr,Bounding Box,Text Recognition,我使用了一个文本检测模型，它给出了边界框坐标。我已经将多边形转换为矩形，用于裁剪图像中的文本区域。生成的边界框被洗牌了，我无法进行排序。据我了解，这些箱子是按Y3分类的。但是，当曲线文本出现在同一行中时，如下图所示，顺序会被打乱，我需要在将其传递给文本提取模型之前对其进行排序将多边形转换为矩形以裁剪文本区域在这种情况下，将显示带有检测到的文本的多边形边界框坐标 146,36354,34354,82146,84“澳大利亚人” 273,78434151411201250129“收集” 1

我使用了一个文本检测模型，它给出了边界框坐标。我已经将多边形转换为矩形，用于裁剪图像中的文本区域。生成的边界框被洗牌了，我无法进行排序。据我了解，这些箱子是按Y3分类的。但是，当曲线文本出现在同一行中时，如下图所示，顺序会被打乱，我需要在将其传递给文本提取模型之前对其进行排序

将多边形转换为矩形以裁剪文本区域

在这种情况下，将显示带有检测到的文本的多边形边界框坐标

146,36354,34354,82146,84“澳大利亚人”

273,78434151411201250129“收集”

146,97250,97250150146150“藤”

7716613112615415899197“旧”

242215361241354273235248“山谷”

1402472242192342500277“伊甸园”

194298306296307324194325“设拉子”

2324063402364421233426“复古”

15240216405215425151422“2008”

124470209480207500122490“南”

2274813874723894228503“澳大利亚”

22256231256431158522583“吉布森”

198564217564217584198584“由”

38657042157042160386600“750毫升”

但预期的输出是，我需要按照以下文本外观顺序排序坐标……澳大利亚->旧->葡萄->收集->伊甸园->山谷->设拉子->2008->年份->南部->澳大利亚->by->吉布森->750ml。

这与熊猫有什么关系？如果与熊猫无关，请删除标签。如果您发布代码，将更容易帮助您。

img_name='rre7'
orig=cv2.imread('CRAFT-pytorch/test/'+str(img_name)+'.jpg')
colnames=['x1','y1','x2','y2','x3','y3','x4','y4']
df=pd.read_csv('result/res_'+str(img_name)+'.txt',header=None, 
delimiter=',', names=colnames)
rect=[]
boxes=df.values
for i,(x1,y1,x2,y2,x3,y3,x4,y4) in enumerate(boxes):
    startX = min([x1,x2,x3,x4])
    startY = min([y1,y2,y3,y4])
    endX = max([x1,x2,x3,x4])
    endY = max([y1,y2,y3,y4])
    #print([startX,startY,endX,endY])
    rect.append([startX,startY,endX,endY])
rect.sort(key=lambda b: b[1])
print("After sorting")
print('\n')
# initially the line bottom is set to be the bottom of the first rect
line_bottom = rect[0][1]+rect[0][3]-1
line_begin_idx = 0
for i in range(len(rect)):
    # when a new box's top is below current line's bottom
    # it's a new line
    if rect[i][1] > line_bottom:
    # sort the previous line by their x
        rect[line_begin_idx:i] = sorted(rect[line_begin_idx:i], key=lambda 
        b: b[0])
        line_begin_idx = i
    # regardless if it's a new line or not
    # always update the line bottom
    line_bottom = max(rect[i][1]+rect[i][3]-1, line_bottom)
# sort the last line
rect[line_begin_idx:] = sorted(rect[line_begin_idx:], key=lambda b: b[0])
for i,(startX,startY, endX,endY) in enumerate(rect):
    roi = orig[startY:endY, startX:endX]   
    cv2.imwrite('gray/'+str(img_name)+'_'+str(i+1)+'.jpg',roi)