cancel
Showing results for 
Search instead for 
Did you mean: 

OCR in Alfresco 7.2

anoop
Champ in-the-making
Champ in-the-making

Hi all,

       Installed version 7.2 (Not docker), runs very well, but the OCR action cant be integrated to it, we used this "https://github.com/keensoft/alfresco-simple-ocr" guide with no luck. Is there a way ? Kindly guide.

Thanks is anticipation

regards

ANOOP

10 REPLIES 10

fedorow
Elite Collaborator
Elite Collaborator

That module use old transformer.

Try this project https://github.com/aborroy/alf-tengine-ocr

anoop
Champ in-the-making
Champ in-the-making

Hi,

I am not quite able to follow the instructions, it is dealing with docker ?  Can you elaborate a little more ?  We dont want it in docker, btw we succeeded to run OCR in 6.

Thanks and regards

anoop

atultalhar
Champ in-the-making
Champ in-the-making

Hi @anoop , I am facing the same problem. Did you find the solution for this one, if yes could you please provide steps to configure this action in the dropdown list.

Hi,

I know it is late, but would like to know if yours is working.

If not please refer https://connect.hyland.com/t5/alfresco-forum/ocr-implementation-in-alfresco-23-x/m-p/484274/thread-i...

and the post mr Fedorow posted just below.

Regards

fedorow
Elite Collaborator
Elite Collaborator

@anoopin the rule dropdown list look for some thing like 'embed-metadata' or some contains word 'embeded'. Sorry, I do not remember exacly.

The working example of OCR you can crate with https://github.com/Alfresco/alfresco-docker-installer. Look inside and implement it without docker, if you like.

anoop
Champ in-the-making
Champ in-the-making

Hi, 

   Since we have a working 6.2 install, we were kinda disappointed in 7.1, now we installed 23.1 and tried to follow your direction. We managed to integrate the .jar file in to alfresco, which allows to create the rule, but apart from that nothing happens when we upload the .pdf. Can you be a litle more specific, we dont have much experience with docker as well.

Any help is very much appreciated.

Regards.

fedorow
Elite Collaborator
Elite Collaborator

The ocr for 7.x version works fine for 23.1.

I can give you just base line where to go. First make ats-transformer-ocr-1.0.0.jar file from this repository:

https://github.com/aborroy/alf-tengine-ocr/blob/master/ats-transformer-ocr/README.md

Next prepare your host. The docker declarative aproach use Dockerfile to prepare working environment, install tools and application. Here is the Dockerfile for ocr container:

https://github.com/aborroy/alf-tengine-ocr/blob/master/ats-transformer-ocr/Dockerfile.

It is a list of instructions for ubuntu. Read it and make proper configurations and installations on your host. Of couse you can't just run the commands from Dockerfile. Make shure it is nececery for you.

The logic of ocr process is next:

  • you run java application ats-transformer-ocr-1.0.0.jar which listen port 8090
  • alfresco ocr module call localTransform.ocr.url=http://localhost:8090/ (add this property to alfresco repository)
  • ats-transformer-ocr-1.0.0.jar get the file from module and run tesseract
  • ocr-ed file returns to Alfresco as new version of file.

Good luck,

Serge

Hi,

I know it is too late, yet I would like you let you know that your tip helped me to implement the feature, we had it abandoned it that time but restarted working on it quite recently, and we got it working today. We express our sincere gratitude to you. Stay well.

Thanks and regards

mitpatoliya
Star Collaborator
Star Collaborator

It's probably because you would have not set the supporting OCR software. That module just set up connectivity between repo and OCR tool. You have ensured that OCR software is being set up properly and related configurations of modules are being set up correctly.