cancel
Showing results for 
Search instead for 
Did you mean: 

OCR issues/functionality

Teresa_Klemann
Champ in-the-making
Champ in-the-making

I am looking for a Company in the Buffalo, NY area that uses OCR for the scanning and indexing of invoices. We are having some issues here with being able to set large amounts of vendors up. Invoices coming though are not legible when they are scanned in, and we have some batches where it is taking OCR over an hour to run on a batch of 60-100. Other venders have batches of over 300 documents and OCR is run in minutes.

9 REPLIES 9

The version number does help. But I meant, are you specifically using Advanced Capture with configured templates to process these documents?

Teresa_Klemann
Champ in-the-making
Champ in-the-making

I believe so. We are setting up templates by vendor. And to answer JeffH - it is not a virtual machine but a standalone server. Iam sorry. I just clarified with our IT department.

OK, so there could be a number of variables in play, but a lot of this could depend on the order of the template form matching, especially given the large number of templates that may result from configuring templates to match vendors. Automated Indexing/Advanced Capture in version 11 offers the ability to run an analysis on how common certain vendors are and move their templates to the top of the comparison order so the most common vendors can be matched more efficiently. This is described on page 204 of the 11.0 Advanced Capture MRG. There are additional improvements in processing efficiency in version 12 and 13 as well.

In terms of image quality, a minimum dpi of 300 is recommended in order for the OCR engine to have a good chance at recognizing characters on a document image. That is something to check at the scanning level as you say documents are not legible at scan time. Is this true of the physical document or once the virtual document is scanned in? If the document is hard to read from the start with the human eye, you may want to speak to your vendor(s) about providing higher quality documents. If a person has trouble reading the document, then an OCR engine will most likely not be able to read the document as well.

Steve_MacWillia
Champ on-the-rise
Champ on-the-rise

Hello Teresa

Konica Minolta support OnBase solutions for scanning invoices and we have offices all over the US.  We support using Hyland ICAP for scanning invoices where you need to extract header/footer information from an invoice.  Speed is enhanced by the number and type of servers doing the OCR work.  The question is how many invoices you want to do an hour and you have to size the OCR servers for that amount of work.

 

If you are trying to do line item extraction from an invoice and your invoices are coming from lots of different suppliers, (meaning you don't have 70% of your invoices coming from 50 designated suppliers, then I would caution you from using Advanced Capture module as an approach.

We have lots of experience with this and also would suggest Fujitsu Scanners to do the image capture work to enhance the characters.  300 DPI is not as important as the clean-ness of the character.  I would rather have a clean de-specked invoice at 240 or even 200 dpi then a noisy invoice at 300 dpi.  Many Fujitsu Scanners come with VRS, Kodak Scanners with VRS too... we have really good luck with the newer Fujitsu Scanners that do not rely on VRS and have their own image clean-up tools built into the scanner driver.  This avoids the VRS middleware requirement and gets us a clean image to work with.

I suspect your issue though is not the quality of the image, but the license you have used to work your invoices.  In many cases using Advanced Capture and having to do a template for every vendor is a never-ending configuration issue. 

I hope this helps.  Steve at Konica Minolta

The DPI recommendation comes directly from the OCR engine vendor. Steve, you are correct that, if given a choice, a clean image is better than a high DPI image. For the OCR engine, a higher DPI simply allows the OCR engine to make better and possibly faster decisions on individual characters as the edges of characters will be sharper and more defined as the OCR engine analyzes each pixel on the document.

We do offer a new product in OnBase 13 called Intelligent Capture for AP, or ICAP as Steve mentioned, that focuses on high volume vendor solutions to avoid creating templates for each vendor's invoices and is very scalable depending on processing volume and required turn around time due to our new multi-threaded, 64-bit OCR server, called the Data Capture Server. In its current iteration, the product can extract header and footer information from invoices with minimal upfront configuration and our team is continuing to enhance the product by adding features for refreshed OnBase 13 builds (which will be encompassed in any future 13 service pack releases), in additon to longer term enhancements in future releases (e.g. OnBase 14, etc).

Getting started

Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.