Written by James McDonald

August 25, 2008

Just made a spectacularly unsuccessful attempt to use Tesseract OCR. Here is a sample:

"|I11;XII;"1n -¤:2;;2:¤ LIEEEEEEE.-::2;;;: nz ’‘·* *--2..
::2;;: "=I;;;. :1:;*-- ::2;£2|a:‘¤I;;;.XXi.X ·¤:;1;;¤· X;|:;t1X
'!EEEXI|· XXXIIEX--EXIIXX-.:21:;. -·¤a.;¤¤a..a·· u:11XIIXX·¤:ii1:¤. uszizez;
1||:;‘tz|XX u:;;;se 11|:t’·· .::22z¤nX;1|:;‘tt|;X .I:11XEIXX
;|g;g;;;;* {ii, *·¤a..¤·* u;;;;:: ;n·":nX "XEE" ¤l§X,X..s**'
XXXII .,,, ¤|ZXXXZ|l .. X"‘i..i"*i..?"`XXII;"tnX.:z;;2||.i‘|i;1. -¤:;;;!: ::222u.X

That was before I Googled and found the link to a helpful howto forge page.

Once I followed that, tesseract spat out suprisingly accurate text _and_ punctuation. Although I didn’t use the suggested ImageMagick convert tool, because contrary to the howto GIMP v2.4.6 spat out a useable TIFF format just fine. One thing I did notice was I had to use Image ==> Flatten Image to get rid of the alpha channel before the save as TIFF option would work.

Sometimes it’s not only the tools but technique also.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.

You May Also Like…

Robocopy exclude Directories

Just trying to copy everything except a couple of directories from a drive to my NAS This is the secret incantation of...