cancel
Showing results for 
Search instead for 
Did you mean: 

Choice of OCR

dranakan
Champ on-the-rise
Champ on-the-rise
Hello,

I am evaluating different OCR to incorporate in Alfresco. The aim of these OCR is for me to get some fields from a paper (on a invoice for exemple). It would to generate a pdf and a other file with the value (name=bob, numberInvoice=23423, …). My softwares to tests are :
- Kofax
- eCopy
- Iris capture
- Adobe Capture

I'am looking for the cheapest. Have you got another OCR that you use in the same context ?

Now, I'am working with Adobe Capture, but I not able to extract the data in other file with the value (name=bob, numberInvoice=23423, …). Someone can explain me how to do ?
12 REPLIES 12

jlabuelo
Champ on-the-rise
Champ on-the-rise
Hi there

we are investigating how to integrate a OCR software within our Alfresco application, I mean we have produced a wizard to produce a jpg file that now we would like to transform  to a PDF with OCR and store it in our Alfresco repository.

Just saw your post about OCR and Alfresco, could you please let us know if you got something to work with any ot these softwares you mention.

I have readed also this article about an OCR software that can be installed with Alfresco, but needs to be done in Windows and we are using a Linux Ubuntu server.
http://wiki.alfresco.com/wiki/Tiger_OCR_integration
If you can share your thoughts with us we will really appreciate it.

Thanks a lot in advance

dranakan
Champ on-the-rise
Champ on-the-rise
Hello,

I think we can use the OCR in three differents ways in Alfresco :

    1) Use a OCR  Software (like Kofax, eCopy, …) on a workstation which create PDF (from TIFF) and put it in a directory of Alfresco
       -> My problem : Need a Workstation
    2) Use a OCR  engine (like intelliant) in the code of Alfresco which convert TIFF to PDF. (to get field in the PDF, it's possible to extract a part of the Tiff and to send to the OCR engine)
       -> My problem : no Engine in cheap price (<600$) under Linux
    3) Use a OCR engine (like TOCR transym) and others tools to create (ourself) a PDF (after adding in Alfresco)
       -> My problem : I don't know the time to develope it…

I would like to find a OCR engine (number two) in cheap prices but for the moment, I found nothing.

I continue to search….

zaizi
Champ in-the-making
Champ in-the-making
Have a look at this. http://code.google.com/p/tesseract-ocr/. I believe this integration has already been done. It can be configured in Alfresco through XML configuration with little or no Java code if you follow these steps http://www.howtoforge.com/ocr_with_tesseract_on_ubuntu704 using Alfresco's http://wiki.alfresco.com/wiki/Content_Transformations#ComplexContentTransformer.

dranakan
Champ on-the-rise
Champ on-the-rise
Hello Zaizi,

I have seen Tesseract (and Ocropus) during my research, but I have not try them because the tests that man can found on the web said that this OCR do not a good recognition and also because this OCR cannot generate a PDF (and keeping the layout, pictures).
(I would use an OCR engine to get Invoices and also to store for Record Management, I need a OCR that can reproduce exactly the original).

Have you tried Tessaract ? Did I wrong?

Thank you.

dranakan
Champ on-the-rise
Champ on-the-rise
Hi all,

We have used the Software ReadIris Pro to the OCR transformation. It's give good results but using an OCR software after scanning is a complicated step for users…

[size=150]All automatic[/size]
Now we want to do all automatic : the user scan, and it can read the PDF in Alfresco. I am looking about Abbyy and Kofax (waiting on the seller's responses). I would like to install this software on the Alfresco Server (Linux).

I am also looking about a scanner doing this OCR transformation, somebody knows this kind of product ?

[size=150]The OCR solution does[/size]
Be able to remove empty pages, scanning  duplex printing, and also be able to interprate separation pages (if a document do more 1 page). I think that could be perhaps done with the scanner… but it could be good if this work can be done by a process on the server (after that, we can use differents scanners)
Budget <3'500 $ for 5'000 documents/month

dgenard
Champ on-the-rise
Champ on-the-rise
Hi,
In the IRIS products set, IRISDocument Server does exactly this … and more.

It's a server-side solution for
- Image pre-processing (Deskewing, Orientation detection, Despeckling, Smoothening)
- Document sorting and separation using bar codes or white pages
- OCR
- Output to many formats (PDF, PDF/A, DOC, OpenDocument, …)
- Hyper-compression of PDF and XPS documents
Just configure your scanner to output scanned images/documents to a shared folder, and IRISDocument does the rest.
Price should be within your budget.
Details on http://www.irislink.com/c2-1600-189/IRISDocument-9---OCR-Server.aspx

ALFEA Consulting provides the connector to automatically send scanned documents with their indexes to Alfresco.
See http://forums.alfresco.com/en/viewtopic.php?f=33&t=20486 for details.

If you need full functionnality for data extraction (i.e. invoices or forms processing), you may consider adding IRIS Capture Pro product.
See http://www.irislink.com/c2-778-189/IRISCapture-Pro-8-5-for-invoices---Overview.aspx

Feel free to ask for a quote by sending email to info-be(at)alfea-consulting.com, or directly to IRIS.
Regards, Denis

dranakan
Champ on-the-rise
Champ on-the-rise
Thank you Dgenard,
It seems to be a good product !

I have also found Abbyy Finereader Engine http://www.abbyy.com/ocr_sdk_linux/, and it can be used on Linux. Does someone use this product ?

dranakan
Champ on-the-rise
Champ on-the-rise
Hello,

I have found another interesting OCR engine : Snowbound http://www.snowbound.com/rastermaster_java/java_overview.html
We can use it in Java (and on Linux).

Does someone use this product ?

dranakan
Champ on-the-rise
Champ on-the-rise
Hello,

Snowbound has no OCR in Java…