是什么导致pytesseract根据使用的是opencv还是枕头来读取双行图像的顶部或底部文本行？

2024-05-07 • 问答

编辑：感谢Nathancy，我忘了处理解决阅读问题的图像。仍然想知道是什么使Tesseract只读取未处理图像的顶部或底部（相同图像，两个不同结果）

原著：
我有一个包含两行文本的图像： random test image for pytesseract

当我使用PIL Image在python（IDLE Python 3.6）中打开图像并使用pytesseract提取字符串时，它仅正确提取了最后/下一行。文本的上一行是乱码垃圾（请参见下面的代码部分）。
但是，当我使用opencv打开图像并使用pytesseract提取字符串时，它只会正确提取上一行/上一行，而会造成混乱文本的第二行/底部行。（另请参见下面的代码部分）

这是代码：

>>> from PIL import Image,ImageFilter
>>> import pytesseract
>>> pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
>>> import cv2

>>> img = Image.open(r"C:\Users\user\MyImage.png")
>>> img2 = cv2.imread(r"C:\Users\user\MyImage.png",cv2.IMREAD_COLOR)


>>> print(pytesseract.image_to_string(img2))
Pet Sock has 448/600 HP left
A ae eee PER eats ae

>>> print(pytesseract.image_to_string(img))
Le TL
JHE has 329/350 HP left.

当我在pytesseract.image_to_boxes和img上同时使用img2时，它将对某些带有不同字母的位置显示相同的边界框（仅显示2条包含相同框的提取行）

>>> print(pytesseract.image_to_boxes(img2))
A 4 6 10 16 0

>>> print(pytesseract.image_to_boxes(img))
J 4 6 10 16 0

当我在pytesseract.image_to_data和img上同时使用img2时，它在行上显示的置信度非常高（95+），而在乱码上则显示了非常低的（30-）行。
Excel table output of image_to_data
编辑：excel表分别是img2和img

我摆弄了psm config的值（我已经尝试了所有的值），除了在设置上创建更多的垃圾：5、7、8、9、10、13；还有一些给出错误：0，2;结果与默认值（我认为是3）没有什么不同

我一定在犯一些菜鸟错误，但是我无法理解为什么会这样。如果任何人都可以向正确的方向发光，那就太好了。

该图像只是我所放置的OCR测试的合适图像，但是随机的。除了尝试pytesseract，没有其他意图。

import cv2 import pytesseract pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" image = cv2.imread('1.png') gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) thresh = cv2.threshold(gray,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] data = pytesseract.image_to_string(thresh,lang='eng',config='--psm 6') print(data) cv2.imshow('thresh',thresh) cv2.waitKey()

是什么导致pytesseract根据使用的是opencv还是枕头来读取双行图像的顶部或底部文本行？

czzzb 回答：是什么导致pytesseract根据使用的是opencv还是枕头来读取双行图像的顶部或底部文本行？

大家都在问