Message Boards Message Boards

1
|
9775 Views
|
4 Replies
|
5 Total Likes
View groups...
Share
Share this post:

TextRecognize options and Tesseract page mode options.

Posted 11 years ago
I saw that Mathematica uses Tesseract as OCR function (you can check in /Applications/Mathematica.app/SystemFiles/Converters/Tesseract/tessdata)
I recent discovery that Tesseract has some nice options that assumes some text orientations called page mode. You can see these options here in this SE answer.
Can I control it with some Mathematica undocumented options?
If not, it would be nice to have this connection in future Mathematica version.
POSTED BY: Rodrigo Murta
4 Replies

If I run something like the following in version 11.1.0

libs = FileNames["*.dylib" | "*.so*", 
 FileNameJoin[{ParentDirectory[DirectoryName[FindFile["TesseractTools`"]]], "LibraryResources", $SystemID}]];
Map[Function[StringCases[ReadString[#], "tesseract " ~~ (DigitCharacter | ".") ..]], libs]

on my Mac and Linux machines, it seems like version 3.04.00 is being used. Not sure about Windows.

POSTED BY: Ilian Gachevski
Undocumented functionality is usually not documented for very good reasons, see all the caveats listed in thisĀ SE discussion.

That being out of the way, after some spelunking, e.g. inspecting the output of
TextRecognize; Information["*`*TextRecognize*"]
it seems an option along the lines of "SegmentationMode" -> 6 might do something (but try it at your own risk).
POSTED BY: Ilian Gachevski
Risk accepted. Worked just perfect.
tks!
POSTED BY: Rodrigo Murta
Posted 7 years ago

Is it possible to find out which version of Tesseract Mathematica uses under the hood?

POSTED BY: Alexey Popkov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract