Hyland Connect

dranakan · ‎02-17-2009

Hello,

I am evaluating different OCR to incorporate in Alfresco. The aim of these OCR is for me to get some fields from a paper (on a invoice for exemple). It would to generate a pdf and a other file with the value (name=bob, numberInvoice=23423, …). My softwares to tests are :
- Kofax
- eCopy
- Iris capture
- Adobe Capture

I'am looking for the cheapest. Have you got another OCR that you use in the same context ?

Now, I'am working with Adobe Capture, but I not able to extract the data in other file with the value (name=bob, numberInvoice=23423, …). Someone can explain me how to do ?

jlabuelo · ‎02-28-2009

Hi there

we are investigating how to integrate a OCR software within our Alfresco application, I mean we have produced a wizard to produce a jpg file that now we would like to transform to a PDF with OCR and store it in our Alfresco repository.

Just saw your post about OCR and Alfresco, could you please let us know if you got something to work with any ot these softwares you mention.

I have readed also this article about an OCR software that can be installed with Alfresco, but needs to be done in Windows and we are using a Linux Ubuntu server.

http://wiki.alfresco.com/wiki/Tiger_OCR_integration

If you can share your thoughts with us we will really appreciate it.

Thanks a lot in advance

dranakan · ‎03-04-2009

Hello,

I think we can use the OCR in three differents ways in Alfresco :

I would like to find a OCR engine (number two) in cheap prices but for the moment, I found nothing.

I continue to search….

zaizi · ‎03-04-2009

Have a look at this. http://code.google.com/p/tesseract-ocr/. I believe this integration has already been done. It can be configured in Alfresco through XML configuration with little or no Java code if you follow these steps http://www.howtoforge.com/ocr_with_tesseract_on_ubuntu704 using Alfresco's http://wiki.alfresco.com/wiki/Content_Transformations#ComplexContentTransformer.

dranakan · ‎03-04-2009

Hello Zaizi,

I have seen Tesseract (and Ocropus) during my research, but I have not try them because the tests that man can found on the web said that this OCR do not a good recognition and also because this OCR cannot generate a PDF (and keeping the layout, pictures).
(I would use an OCR engine to get Invoices and also to store for Record Management, I need a OCR that can reproduce exactly the original).

Have you tried Tessaract ? Did I wrong?

Thank you.

dranakan · ‎09-03-2009

Hi all,

We have used the Software ReadIris Pro to the OCR transformation. It's give good results but using an OCR software after scanning is a complicated step for users…

[size=150]All automatic[/size]
Now we want to do all automatic : the user scan, and it can read the PDF in Alfresco. I am looking about Abbyy and Kofax (waiting on the seller's responses). I would like to install this software on the Alfresco Server (Linux).

I am also looking about a scanner doing this OCR transformation, somebody knows this kind of product ?

[size=150]The OCR solution does[/size]
Be able to remove empty pages, scanning duplex printing, and also be able to interprate separation pages (if a document do more 1 page). I think that could be perhaps done with the scanner… but it could be good if this work can be done by a process on the server (after that, we can use differents scanners)
Budget <3'500 $ for 5'000 documents/month

dgenard · ‎09-04-2009

Hi,
In the IRIS products set, IRISDocument Server does exactly this … and more.

It's a server-side solution for
- Image pre-processing (Deskewing, Orientation detection, Despeckling, Smoothening)
- Document sorting and separation using bar codes or white pages
- OCR
- Output to many formats (PDF, PDF/A, DOC, OpenDocument, …)
- Hyper-compression of PDF and XPS documents
Just configure your scanner to output scanned images/documents to a shared folder, and IRISDocument does the rest.
Price should be within your budget.
Details on http://www.irislink.com/c2-1600-189/IRISDocument-9---OCR-Server.aspx

ALFEA Consulting provides the connector to automatically send scanned documents with their indexes to Alfresco.
See http://forums.alfresco.com/en/viewtopic.php?f=33&t=20486 for details.

If you need full functionnality for data extraction (i.e. invoices or forms processing), you may consider adding IRIS Capture Pro product.
See http://www.irislink.com/c2-778-189/IRISCapture-Pro-8-5-for-invoices---Overview.aspx

Feel free to ask for a quote by sending email to info-be(at)alfea-consulting.com, or directly to IRIS.
Regards, Denis

dranakan · ‎09-10-2009

Thank you Dgenard,
It seems to be a good product !

I have also found Abbyy Finereader Engine http://www.abbyy.com/ocr_sdk_linux/, and it can be used on Linux. Does someone use this product ?

dranakan · ‎10-05-2009

Hello,

I have found another interesting OCR engine : Snowbound http://www.snowbound.com/rastermaster_java/java_overview.html
We can use it in Java (and on Linux).

Does someone use this product ?

dranakan · ‎10-07-2009

Hello,

Snowbound has no OCR in Java…

Hyland Connect

Choice of OCR