HOME - Recent Changes - Search:

Academic Work


Personal

* pot de départ


dblp


(:twitter:)

-----

[ edit | logout ]
[ help | sandbox | passwd ]

Character Recognition

#############################

Problem

I had to extract the text of a letter written in French.

Solution

First, scan the page and save it as an image (JPG for instance). Then install gscan2pdf. It will install the character recognition engine tesseract too. Start Synaptic and install the French language extension too (the package is called tesseract-ocr-fra).

Here is a nice tutorial that explains how to use gscan2pdf: http://www.youtube.com/watch?v=UjjogfWfWsQ .

In short:

  • import the image
  • ToolsOCR. OCR Engine: Tesseract.
  • Start OCR

Links

Cloud City


anime | bash | blogs | bsd | c/c++ | c64 | calc | comics | convert | cube | del.icio.us | digg | east | eBooks | egeszseg | elite | firefox | flash | fun | games | gimp | google | groovy | hardware | hit&run | howto | java | javascript | knife | lang | latex | liferay | linux | lovecraft | magyar | maths | movies | music | p2p | perl | pdf | photoshop | php | pmwiki | prog | python | radio | recept | rts | scala | scene | sci-fi | scripting | security | shell | space | súlyos | telephone | torrente | translate | ubuntu | vim | wallpapers | webutils | wikis | windows


Blogs and Dev.

* Ubuntu Incident
* Python Adventures
* me @ GitHub


Places

Debrecen | France | Hungary | Montreal | Nancy


Notes

full circle | km


Hobby Projects

* Jabba's Codes
* PmWiki
* Firefox
* PHP
* JavaScript
* Scriptorium
* Tutorials
* me @ GitHub


Quick Links


[ edit ]

View - Edit - History - Attach - Print *** Report - Recent Changes - Search
Page last modified on 2010 September 15, 15:59