Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Secrets of the paperless office: optimizing OCR

Joe Kissell | July 11, 2013
OCR software converts document scans into searchable PDFs. But what settings and software will get you the most accurate results while using the least hard-disk space? Joe Kissell's results may surprise you.

So, where's the sweet spot?
All these results boil down to the following: For the best compromise between file size and OCR accuracy, scan at 300 dpi in grayscale at medium compression, unless color plays an essential role in the original document, in which case switch to color but leave everything else the same. Avoid scanning in black and white, even if your documents are plain text on white paper.

Given the choice of OCR engines, avoid Acrobat Pro (especially version XI) despite its smaller file sizes. FineReader offers superior accuracy, an important consideration when you try to use your digitized documents. If you use an embedded version that lets you adjust compression and downsampling—like the one included in DEVONthink Pro Office—you will avoid problems with inconsistent file sizes. With any tool that lets you control the downsampling (remember, this happens after text recognition) adjust the settings to 150 dpi and go for about 50 percent compression quality.

 

Previous Page  1  2  3  4 

Sign up for CIO Asia eNewsletters.