Opencv Tesseract OCR无法捕获'；['；和'；]'；_Opencv_Python Imaging Library_Ocr_Tesseract_Python Tesseract

Opencv Tesseract OCR无法捕获'；['；和'；]'；

opencv

Opencv Tesseract OCR无法捕获'；['；和'；]'；,opencv,python-imaging-library,ocr,tesseract,python-tesseract,Opencv,Python Imaging Library,Ocr,Tesseract,Python Tesseract,我试图在Python中使用Tesseract OCR和OpenCV提取图像的文本部分。我附上了一个简单的图像如下。我已经在油漆上创建了这个图像，这意味着没有噪音或预处理的需要场景1: import pytesseract plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='tur', config=tessdata_dir_config) print(plainText) İtestöü) i

我试图在Python中使用Tesseract OCR和OpenCV提取图像的文本部分。我附上了一个简单的图像如下。我已经在油漆上创建了这个图像，这意味着没有噪音或预处理的需要

场景1:

import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='tur', config=tessdata_dir_config)
print(plainText)

İtestöü)

import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='eng', config=tessdata_dir_config)
print(plainText)

[testou]

输出：

import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='tur', config=tessdata_dir_config)
print(plainText)

İtestöü)

import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='eng', config=tessdata_dir_config)
print(plainText)

[testou]

场景2:

import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='tur', config=tessdata_dir_config)
print(plainText)

İtestöü)

import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='eng', config=tessdata_dir_config)
print(plainText)

[testou]

输出：

import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='tur', config=tessdata_dir_config)
print(plainText)

İtestöü)

import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='eng', config=tessdata_dir_config)
print(plainText)

[testou]

尽管如此，我还是不能正确地捕获非常简单的文本。如果我更改语言设置，它会捕获括号，但会丢失土耳其字符，这是可以接受的。但是，带有土耳其设置的（场景1）是不可接受的，因为它缺少括号。有什么建议吗

tesseract v5.0.0-alpha.20200328
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE

它应该是关于不同语言的训练数据集。用英语更恰当地工作是正常的。也许它可以先使用土耳其语检测字母，然后使用英语数据集检测符号。@YunusTemurlenk感谢您的评论。我已经放置了土耳其语的培训数据文件。这意味着对我来说，它应该能够提取字母，数字和符号，至少在噪音清晰的情况下。我不这样认为，正如我通过主要问题所解释的那样，这是可以接受的。这可能与某些设置有关；然而，到目前为止，我还没有找到一条路。对于给定的最小示例，您的建议看起来很好，但不值得在我的实际问题上尝试。这是特定于“]”符号还是它也无法检测其他符号？对于某些符号可以正常工作，但括号或类似符号是有问题的！