Written by James McDonald

August 25, 2008

Just made a spectacularly unsuccessful attempt to use Tesseract OCR. Here is a sample:

"|I11;XII;"1n -¤:2;;2:¤ LIEEEEEEE.-::2;;;: nz ’‘·* *--2..
::2;;: "=I;;;. :1:;*-- ::2;£2|a:‘¤I;;;.XXi.X ·¤:;1;;¤· X;|:;t1X
'!EEEXI|· XXXIIEX--EXIIXX-.:21:;. -·¤a.;¤¤a..a·· u:11XIIXX·¤:ii1:¤. uszizez;
1||:;‘tz|XX u:;;;se 11|:t’·· .::22z¤nX;1|:;‘tt|;X .I:11XEIXX
;|g;g;;;;* {ii, *·¤a..¤·* u;;;;:: ;n·":nX "XEE" ¤l§X,X..s**'
XXXII .,,, ¤|ZXXXZ|l .. X"‘i..i"*i..?"`XXII;"tnX.:z;;2||.i‘|i;1. -¤:;;;!: ::222u.X

That was before I Googled and found the link to a helpful howto forge page.

Once I followed that, tesseract spat out suprisingly accurate text _and_ punctuation. Although I didn’t use the suggested ImageMagick convert tool, because contrary to the howto GIMP v2.4.6 spat out a useable TIFF format just fine. One thing I did notice was I had to use Image ==> Flatten Image to get rid of the alpha channel before the save as TIFF option would work.

Sometimes it’s not only the tools but technique also.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.

You May Also Like…

Network speed test host to host

On Ubuntu / Debian apt-get install iperf3 On Windows download it from https://iperf.fr/iperf-download.php#windows Make...

Clear HSTS Settings in CHrome

Open chrome://net-internals/#hsts enter the domain in the query field and click Query to confirm it has HSTS settings...