Tensorflow 在图像包含文本数据的单个类中，是否可以对YOLO（任何版本）进行培训。（查找方程的区域）_Tensorflow_Keras_Deep Learning_Yolo_Object Recognition

Tensorflow 在图像包含文本数据的单个类中，是否可以对YOLO（任何版本）进行培训。（查找方程的区域）

tensorflow keras deep-learning

Tensorflow 在图像包含文本数据的单个类中，是否可以对YOLO（任何版本）进行培训。（查找方程的区域）,tensorflow,keras,deep-learning,yolo,object-recognition,Tensorflow,Keras,Deep Learning,Yolo,Object Recognition,我想知道YOLO（任何版本，特别是精确的版本，而不是速度的版本）是否可以在文本数据上进行训练。我想做的是在文本图像中找到任何等式存在的区域例如，我想找到感兴趣的2个灰色区域，这样我就可以勾勒出方程的轮廓，并最终分别裁剪方程我提出这些问题是因为：首先我还没有找到一个将YOLO用于文本数据的地方。其次，我们如何定制与（416416）不同的低分辨率图像，因为所有图像都是裁剪的或水平的，大部分是（W=2H）格式我已经为文本数据实现了YOLO-V3版本，但使用的是OpenCv，它基本上是用于CP

我想知道YOLO（任何版本，特别是精确的版本，而不是速度的版本）是否可以在文本数据上进行训练。我想做的是在文本图像中找到任何等式存在的区域
例如，我想找到感兴趣的2个灰色区域，这样我就可以勾勒出方程的轮廓，并最终分别裁剪方程
我提出这些问题是因为：首先我还没有找到一个将YOLO用于文本数据的地方。其次，我们如何定制与（416416）不同的低分辨率图像，因为所有图像都是裁剪的或水平的，大部分是（W=2H）格式
我已经为文本数据实现了YOLO-V3版本，但使用的是OpenCv，它基本上是用于CPU的。我想从头开始训练这个模特
请帮忙。任何Keras、Tensorflow或PyTorch都可以
下面是我在OpenCv中实现的代码

net = cv2.dnn.readNet(PATH+"yolov3.weights", PATH+"yolov3.cfg") # build the model. NOTE: This will only use CPU layer_names = net.getLayerNames() # get all the layer names from the network 254 layers in the network output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] # output layer is the # 3 output layers in otal blob = cv2.dnn.blobFromImage(image=img, scalefactor=0.00392, size=(416,416), mean=(0, 0, 0), swapRB=True,) # output as numpy array of (1,3,416,416). If you need to change the shape, change it in the config file too # swap BGR to RGB, scale it to a threshold, resize, subtract it from the mean of 0 for all the RGB values net.setInput(blob) outs = net.forward(output_layers) # list of 3 elements for each channel class_ids = [] # id of classes confidences = [] # to store all the confidence score of objects present in bounding boxes if 0, no object is present boxes = [] # to store all the boxes for out in outs: # get all channels one by one for detection in out: # get detection one by one scores = detection[5:] # prob of 80 elements if the object(s) is/are inside the box and if yes, with what prob class_id = np.argmax(scores) # Which class is dominating inside the list confidence = scores[class_id] if confidence > 0.1: # consider only those boxes which have a prob of having an object > 0.55 # grid coordinates center_x = int(detection[0] * width) # centre X of grid center_y = int(detection[1] * height) # Center Y of grid w = int(detection[2] * width) # width h = int(detection[3] * height) # height # Rectangle coordinates x = int(center_x - w / 2) y = int(center_y - h / 2) boxes.append([x, y, w, h]) # get all the bounding boxes confidences.append(float(confidence)) # get all the confidence score class_ids.append(class_id) # get all the clas ids

作为对象检测器，Yolo只能用于特定的文本检测，而不能用于检测图像中可能存在的任何文本
例如，
Yolo
可以通过培训进行基于文本的徽标检测，如下所示：

我想在这张图片中找到2个感兴趣的灰色区域我可以概述，并最终分别裁剪方程
您的问题陈述涉及检测图像中存在的任何方程式（数学公式），因此无法单独使用
Yolo
。我认为这与您的用例类似。他们将使用经过培训的光学字符识别（OCR）系统，并根据他们的使用情况进行微调
最终要做一些类似于
mathpix
，
OCR
为您的用例定制的系统就是您所需要的。不会有任何现成的解决方案。你必须建造一个
提议的方法：

注意：不能使用Tesseract原样，因为它是经过预训练的模型，经过训练可读取任何字符。您可以参考第二篇论文来培训tesseract以适应您的用例
要了解OCR，您可以阅读相关内容
编辑：
所以，我们的想法是建立自己的OCR来检测构成等式/数学公式的东西，而不是检测每个字符。您需要在标记方程式的位置设置数据集。基本上，你要寻找带有数学符号的区域（比如求和、积分等）
培训您自己的OCR的一些教程：

所以，我们的想法是，按照这些教程来学习如何训练为任何用例构建
OCR
，然后阅读研究论文我上面提到过，也提到了我上面提到的一些基本想法针对您的用例构建OCR

其实这就是我想做的。我知道mathpix，我想建造这样的东西。你知道他们的模型是如何找到方程存在的区域的吗？“Tesseract不处理我的数据，但Mathpix处理。@Deshwal编辑了答案以解决您的查询。当然，mathpix没有公开他们的代码或完整的内部结构，请注意，这是付费的。我已经给了你如何着手构建这样的东西的方向。你能指出一篇论文的教程或任何解决方案的OCR，因为我很难直接从研究论文中实现结构。@Deshwal编辑了答案。阅读我答案的
编辑部分