Hyland Connect

boneill · ‎01-31-2014

Hi Guys,

Does anyone have any advice on how to integrate an OCR service into alfresco.. I understand that OCR is normally done by apps like Kofax but our client would like to be able to upload an image or scanned pdf and let Alfresco handle the OCR step so that the docs can be found during search.

Any advice, suggestions or experience in this would be greatly appreciated.

Regards

Brian

jpotts · ‎01-31-2014

Alfresco doesn't provide OCR capabilities out-of-the-box. You might take a look at http://www.ephesoft.com/ and see if that can be of assistance.

The Add-Ons directory also has a number of OCR solutions: http://addons.alfresco.com/search/node/ocr

If you want to roll up your sleeves and do your own integration without relying on an integration that's already been built, you can find various OCR libraries out there. Here's one: http://code.google.com/p/tesseract-ocr/.

Jeff

Jeff Potts
https://www.metaversant.com | https://ecmarchitect.com

boneill · ‎02-02-2014

Hi Jeff,

Thanks for the response. This is exactly the information I needed.

Brian

djnemo2 · ‎02-05-2014

Hi,

Have you tried any of those solutions ?

What is the best(even non-free) solution to have scan and save in alfresco ?
(i think it most be compatible with alfresco to add some metadata/tags to alfresco for every document that add to alfresco for search and …)

Thanks

jpotts · ‎02-05-2014

Metadata extraction is available out-of-the-box. But if you are uploading an image of the document there is no metadata to extract. You need something to convert the image to machine readable text. That's OCR and is not available out-of-the-box.

Jeff

Jeff Potts
https://www.metaversant.com | https://ecmarchitect.com

djnemo2 · ‎02-06-2014

Is there any third party software that someone already used for this ?
That scan the document, Based on Contents Save it in good directory on server and give report that which document is where ?

Thank you

susannamoore · ‎02-18-2014

There are lots many <a href="http://www.rasteredge.com/dotnet-imaging/addon-ocr-sdk/">OCR software</a> that can do the work.

I generally use RE.OCR.SDK.

scouil · ‎02-18-2014

The page you linked is for .net.
Is there a Java integration as well or was this just a spambot?

susannamoore · ‎02-18-2014

Hi, Souil

I just tried this, but not sure weather this site provides the one for java integration.
maybe just imaging processing library for java.

jpotts · ‎02-19-2014

There was recently some discussion on IRC about this project:
https://code.google.com/p/alfresco-tesseract-search/

Out-of-the-box it was not working with 4.2 but one of our community members did some quick repackaging and got it working on 4.2 in about 30 minutes.

After doing that, he was able to take scanned images, check them in to Alfresco, and then do a full-text search against them. The tesseract OCR piece was responsible for extracting the text from the scanned images and making it available to the indexer.

Jeff

Jeff Potts
https://www.metaversant.com | https://ecmarchitect.com

Hyland Connect

OCR for images, pdfs etc