Python 如何计算OCR系统的置信度分数？_Python_Ocr_Tesseract_Python Tesseract_Text Recognition

Python 如何计算OCR系统的置信度分数？

python

Python 如何计算OCR系统的置信度分数？,python,ocr,tesseract,python-tesseract,text-recognition,Python,Ocr,Tesseract,Python Tesseract,Text Recognition,我正在从事一个OCR项目，我想知道如何计算我的OCR系统的信心分数我有数字万用表图像。在图像中的设备屏幕上有一些测量结果。我想承认这些价值观。然而，根据我的研究，我不确定哪种OCR置信度计算技术适合我的系统据我所知，OCR信心分数可以按字符、单词和句子进行计算。实际上，后两种方法是建立在性格自信分数的基础上的。在我的例子中，性格方面的计算可能是错误的或不够的例如，我有“40.245 V”文本。我得到了两个不同的识别结果，比如“40.247V”和“70.245V”。如果我没有错的话，两个结果

我正在从事一个OCR项目，我想知道如何计算我的OCR系统的信心分数

我有数字万用表图像。在图像中的设备屏幕上有一些测量结果。我想承认这些价值观。然而，根据我的研究，我不确定哪种OCR置信度计算技术适合我的系统

据我所知，OCR信心分数可以按字符、单词和句子进行计算。实际上，后两种方法是建立在性格自信分数的基础上的。在我的例子中，性格方面的计算可能是错误的或不够的

例如，我有“40.245 V”文本。我得到了两个不同的识别结果，比如“40.247V”和“70.245V”。如果我没有错的话，两个结果都会有相同或相近的信心分数。然而，“40.247V”的预测是可以接受的，而“70.245V”在我的情况下是不可接受的

您知道如何计算这种情况下的置信度得分吗？

在计算置信度时，您会生成一个置信度的加权平均值，以增加前几个字符的权重，减少最后一个字符的权重

#include <iostream>
#include <vector>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

using namespace std;

double getWeightedConfidence(vector<pair<char /* character */, double /*confidence of that character */>> word) {
    if (word.empty()) {
        return 1.0;
    }
    
    double confidence = 0;
    
    if (isdigit(word[0].first)) {
        // okay it is a number
        
        double weight = 1;
        double sumOfWeights = 0;
        for (const auto &c : word) {
            confidence += c.second * weight;
            sumOfWeights += weight;
            weight /= 10; // you can decay it by whatever number you want based on how much do you think next digit is less valueble then previous
        }
        
        confidence /= sumOfWeights;
    } else {
        // not a number - just calculate a normal average
        for (const auto &c : word) {
            confidence += c.second;
        }
        
        confidence /= word.size();
    }
    
    return confidence;
}

int main() {
    
    vector<pair<char, double>> number_with_first_digit_wrong;
    number_with_first_digit_wrong.emplace_back('7', 0.1);
    number_with_first_digit_wrong.emplace_back('4', 0.9);
    number_with_first_digit_wrong.emplace_back('6', 0.9);
    number_with_first_digit_wrong.emplace_back('2', 0.9);
    number_with_first_digit_wrong.emplace_back('.', 0.9);
    number_with_first_digit_wrong.emplace_back('9', 0.9);
    
    vector<pair<char, double>> number_with_last_digit_wrong;
    number_with_last_digit_wrong.emplace_back('7', 0.9);
    number_with_last_digit_wrong.emplace_back('4', 0.9);
    number_with_last_digit_wrong.emplace_back('6', 0.9);
    number_with_last_digit_wrong.emplace_back('2', 0.9);
    number_with_last_digit_wrong.emplace_back('.', 0.9);
    number_with_last_digit_wrong.emplace_back('9', 0.1);
    
    
    cout << getWeightedConfidence(number_with_first_digit_wrong) << " " << getWeightedConfidence(number_with_last_digit_wrong) << endl;
    
    return 0;
}

#包括
#包括
#包括
#包括
#包括
使用名称空间std；
双加权置信度（向量字）{
if（word.empty（））{
返回1.0；
}
双置信度=0；
if（isdigit（字[0]。第一个））{
//好的，这是一个数字
双倍重量=1；
双倍重量总和=0；
for（const auto&c:word）{
置信度+=c.秒*重量；
重量总和+=重量；
weight/=10；//您可以根据您认为下一个数字的值比上一个数字的值小多少来衰减它
}
置信度/=总重；
}否则{
//不是一个数字-只是计算一个正常的平均值
for（const auto&c:word）{
置信度+=c秒；
}
置信度/=单词大小（）；
}
恢复信心；
}
int main（）{
向量号与第一位数字错误；
第一位数字错误的数字。向后放置（'7'，0.1）；
第一位数字错误的数字。向后放置（'4'，0.9）；
第一位数字错误的数字。向后放置（'6'，0.9）；
第一位数字错误的数字。向后放置（'2'，0.9）；
第一位数字错误的数字。后位（'.'，0.9）；
第一位数字错误的数字。向后放置（'9'，0.9）；
向量号与最后一位错误；
编号错误，最后一位数字错误。向后放置（'7'，0.9）；
编号错误，最后一位数字错误。背面定位（'4'，0.9）；
编号错误，最后一位数字错误。背面定位（'6'，0.9）；
编号错误，最后一位数字错误。背面定位（'2'，0.9）；
最后一位数字的数字错误。向后放置（'.'，0.9）；
编号错误，最后一位数字错误。背面定位（'9'，0.1）；
库特