Academic Work

Personal

* pot de départ

(:twitter:)

-----

[ edit | logout ]
[ help | sandbox | passwd ]

Character Recognition

#############################

Problem

I had to extract the text of a letter written in French.

Solution

First, scan the page and save it as an image (JPG for instance). Then install gscan2pdf. It will install the character recognition engine tesseract too. Start Synaptic and install the French language extension too (the package is called tesseract-ocr-fra).

Here is a nice tutorial that explains how to use gscan2pdf: http://www.youtube.com/watch?v=UjjogfWfWsQ .

In short:

import the image
Tools → OCR. OCR Engine: Tesseract.
Start OCR

Links

Tesseract

Cloud City

anime | bash | blogs | bsd | c/c++ | c64 | calc | comics | convert | cube | del.icio.us | digg | east | eBooks | egeszseg | elite | firefox | flash | fun | games | gimp | google | groovy | hardware | hit&run | howto | java | javascript | knife | lang | latex | liferay | linux | lovecraft | magyar | maths | movies | music | p2p | perl | pdf | photoshop | php | pmwiki | prog | python | radio | recept | rts | scala | scene | sci-fi | scripting | security | shell | space | súlyos | telephone | torrente | translate | ubuntu | vim | wallpapers | webutils | wikis | windows

Blogs and Dev.

* Ubuntu Incident
* Python Adventures
* me @ GitHub

Places

Debrecen | France | Hungary | Montreal | Nancy

Notes

full circle | km

Hobby Projects

* Jabba's Codes
* PmWiki
* Firefox
* PHP
* JavaScript
* Scriptorium
* Tutorials
* me @ GitHub

Quick Links

[ edit ]

View - Edit - History - Attach - Print *** Report - Recent Changes - Search

Page last modified on 2010 September 15, 15:59