Written by James McDonald

August 25, 2008

Just made a spectacularly unsuccessful attempt to use Tesseract OCR. Here is a sample:

"|I11;XII;"1n -¤:2;;2:¤ LIEEEEEEE.-::2;;;: nz ’‘·* *--2..
::2;;: "=I;;;. :1:;*-- ::2;£2|a:‘¤I;;;.XXi.X ·¤:;1;;¤· X;|:;t1X
'!EEEXI|· XXXIIEX--EXIIXX-.:21:;. -·¤a.;¤¤a..a·· u:11XIIXX·¤:ii1:¤. uszizez;
1||:;‘tz|XX u:;;;se 11|:t’·· .::22z¤nX;1|:;‘tt|;X .I:11XEIXX
;|g;g;;;;* {ii, *·¤a..¤·* u;;;;:: ;n·":nX "XEE" ¤l§X,X..s**'
XXXII .,,, ¤|ZXXXZ|l .. X"‘i..i"*i..?"`XXII;"tnX.:z;;2||.i‘|i;1. -¤:;;;!: ::222u.X

That was before I Googled and found the link to a helpful howto forge page.

Once I followed that, tesseract spat out suprisingly accurate text _and_ punctuation. Although I didn’t use the suggested ImageMagick convert tool, because contrary to the howto GIMP v2.4.6 spat out a useable TIFF format just fine. One thing I did notice was I had to use Image ==> Flatten Image to get rid of the alpha channel before the save as TIFF option would work.

Sometimes it’s not only the tools but technique also.


Submit a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.

You May Also Like…

Meraki Open Source Licenses

Until today I assumed that Meraki was built in-house with only closed source software. But having a look at the...


If you have Veeam backup failing with the Updating BCD failed with Cannot update SafeBoot flag and SentinelOne is...