Monday, June 30, 2008

Optical Character Recognition (OCR)

Everything is going digital. From diaries to business forms are all transitioning to a digital format, but some people are still light-years behind schedule. Chances are, you still have a huge pile of papers which would be much more useful if you are able to "search" through the piles and find what you want. There is a way to do this and it is called optical character recognition (OCR).

The technical definition of OCR is

is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text.
In other words, the ability for a machine to read handwriting. There use to be only one software that I can reccomand but as of right now, another was released... To see some things I done with OCR, take a look at the All-State roster that I scanned and transfered to a digital copy (also that page is the most visited page on this blog by far, a total of  199 times from the beginning of the year!).

1.  Windows Office Document Imaging - This piece of software is a hidden gem in the Office suite. It's located under Microsoft Office folder in the start menu.
Start -> All Programs -> Microsoft Office -> Microsoft Office Tool -> Microsoft Office Document Imaging
When you first run it, it might ask you for the installation CD, there is a way to bypass this problem but I can't really help you as I forgot the error code of whatever... but it's solvable :). Anyways, to scan the document, you can either use the manufacturer's software or Window's built-in Scanner wizard.

To use the built-in feature, open up Control Panel and look for Scanner and Cameras and select the scanner that you want to use.



The one thing that you must do with either software is to remember to save as ".TIF" or else Document Imaging will not work.



Scan the image and open the .TIF file with Document Imaging. Now here comes the magical part. You can click the two buttons indicated below (red then green) or click Tools -> Recognize Text using OCR... then export to Word.



Now voila! The scanned document is in a Word file! The only problem with this is that the formatting is all messed up...but now it's search able!

2. Evernote - Evernote is an amazing piece of software. As of today, it is out of beta and the final version came out for the public. I think the video below can describe Evernote a LOT better than I can using words and pictures (Screencast.com is being stupid and going to charge users now...), so here you go. (LINK)



There are also other options like using the uber-expensive Adobe Acrobat (but it sucks too...) and other software, but these are the only one I have experience with.

-runiteking1

Got comments? Post them below!

No comments:

Post a Comment