Google cloud platform Char'；s顶点的边界框顺序_Google Cloud Platform_Ocr_Google Cloud Vision

Google cloud platform Char'；s顶点的边界框顺序

google-cloud-platform

Google cloud platform Char'；s顶点的边界框顺序,google-cloud-platform,ocr,google-cloud-vision,Google Cloud Platform,Ocr,Google Cloud Vision,Google Vision API文档指出，检测到的字符的顶点将始终按相同顺序排列： // The bounding box for the symbol. // The vertices are in the order of top-left, top-right, bottom-right, // bottom-left. When a rotation of the bounding box is detected the rotation // is represented as ar

Google Vision API文档指出，检测到的字符的顶点将始终按相同顺序排列：

// The bounding box for the symbol.
// The vertices are in the order of top-left, top-right, bottom-right,
// bottom-left. When a rotation of the bounding box is detected the rotation
// is represented as around the top-left corner as defined when the text is
// read in the 'natural' orientation.
// For example:
//   * when the text is horizontal it might look like:
//      0----1
//      |    |
//      3----2
//   * when it's rotated 180 degrees around the top-left corner it becomes:
//      2----3
//      |    |
//      1----0
//   and the vertice order will still be (0, 1, 2, 3).

但是，有时我可以看到不同的顶点顺序。以下是来自同一图像的两个具有相同方向的字符的示例：

[x:778 y:316  x:793 y:316  x:793 y:323  x:778 y:323 ]
0----1
|    |
3----2

及

为什么顶点的顺序不一样？与文档中的情况不同？

这似乎是Vision API中的一个bug。解决方案是检测图像方向，然后按正确顺序重新排列顶点

不幸的是，Vision API在其输出中不提供图像方向，所以我必须编写代码来检测它

水平/垂直方向可以通过比较字符高度和宽度来检测。高度通常大于宽度

下一步是检测文本的方向。例如，在垂直图像方向的情况下，文本可能从上到下或从下到上

输出中的大多数字符似乎以自然的方式出现。因此，通过查看统计数据，我们可以检测文本方向。例如： 1号线有Y坐标1000 2号线有Y坐标900 3号线有Y坐标950 4号线有Y坐标800 我们可以看到图像被倒置旋转。

必须对四个点的顶点重新排序（从A到D顺时针反转）：
A-B-C-D：
A:最小X，最小Y
B:最大X，最小Y
C:最大X，最大Y
D:最小X，最大Y
并保存到矩形对象。

以前也有同样的问题。使用字符宽度和高度似乎是个好主意。（Y）根据我的结果，顶点总是正确的，但OCR本身没有找到单词的正确方向。在我的Examaple中，我有带有简单文本文档的图像。但是图像被逆时针旋转了90度。大多数单词检测正确，顶点与预期一致。对于某些单词，顶点旋转了180度，OCR真的“读取”了颠倒的单词（并且读取错误）。

[x:857 y:295  x:857 y:287  x:874 y:287  x:874 y:295 ]
1----2
|    |
0----3