Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/kotlin/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/qt/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Android 如何正确地从tess two中提取文本和boxRect?_Android_Kotlin_Tess Two - Fatal编程技术网

Android 如何正确地从tess two中提取文本和boxRect?

Android 如何正确地从tess two中提取文本和boxRect?,android,kotlin,tess-two,Android,Kotlin,Tess Two,我有一个带有返回对象的TessBaseAPI()对象。我想用它们的边界框提取单词,但似乎无法使其工作 val Text = tesseract.getUTF8Text() 给我文本 val Words = tesseract.getWords.boxRects 提供了可以循环使用的边界框,但它们与getUTF8Text()不匹配 循环遍历tesseract.getWords中的数据对象并尝试将其转换为string会让我感到jibberish val Words = tesseract.get

我有一个带有返回对象的TessBaseAPI()对象。我想用它们的边界框提取单词,但似乎无法使其工作

val Text = tesseract.getUTF8Text()
给我文本

val Words = tesseract.getWords.boxRects
提供了可以循环使用的边界框,但它们与getUTF8Text()不匹配

循环遍历tesseract.getWords中的数据对象并尝试将其转换为string会让我感到jibberish

val Words = tesseract.getWords
for(i in Words) {
    Log.i(TAG, i.data.toString())
}
通过使用.getHOCRText并对生成的内容执行正则表达式来获取文本和框,我发现了一个非常糟糕的解决方法

val result = tesseract.getHOCRText(0)

val BoxPattern = Pattern.compile("(?<=title='bbox ).*?(?=; x_wconf)")
val BoxMatch = BoxPattern.matcher(result)
while(BoxMatch.find()) {
    Log.i(TAG, BoxMatch.group().toString())
}

val TextPattern = Pattern.compile("(?<='>).*?(?=<\\/span>)")
val TextMatch = TextPattern.matcher(result)
while(TextMatch.find()) {
    Log.i(TAG, TextMatch.group().toString())
}
val result=tesseract.getHOCRText(0)
val-BoxPattern=Pattern.compile((?).*(?=))
val TextMatch=TextPattern.matcher(结果)
while(TextMatch.find()){
Log.i(标记,TextMatch.group().toString())
}
那么,如何正确地从tess two中提取文本和boxRects呢?

我解决了它

// As before
val tesseract = TessBaseAPI()
tesseract.init("/storage/emulated/0/com.ubft/", "eng")
tesseract.setImage(bm)

// Call utF8Text. Otherwise iterator returns null
tesseract.utF8Text

// Initiate an iterator
val iterator = tesseract.getResultIterator()

iterator.begin()
do {
    val text = iterator.getUTF8Text(TessBaseAPI.PageIteratorLevel.RIL_TEXTLINE)
    val boundingBox = iterator.getBoundingRect(TessBaseAPI.PageIteratorLevel.RIL_TEXTLINE)

    // Do what you want with the result...

    } while (iterator.next(TessBaseAPI.PageIteratorLevel.RIL_TEXTLINE))

iterator.delete()
可以将TessbaseAPI.PageIteratorLevel指定为要返回的文本结构类型(段落、单词、行或字符)