cancel
Showing results for 
Search instead for 
Did you mean: 

OCRMyPDF(alfresco-simple-ocr) integration with Alfresco 6.1 using Kubernetes & Helm charts

SriramG
Champ on-the-rise
Champ on-the-rise

With the approach suggested by Fedorow in https://hub.alfresco.com/t5/alfresco-content-services-forum/quot-ocr-extract-quot-action-doesn-t-wor..., I was able to make OCR work with Alfresco 6.1.0 and Docker.

I updatedocr_input and /ocr_output to use /usr/local/tomcat/ocr_input and /usr/local/tomcat/ocr_out so that alfresco container can access these folders without any access issues. 

Thanks Fedorow

Below are the changes done to docker-compose.yml and ocrmypdf.sh 

docker-compose.yml

...
services:
   alfresco:
      ...
     volumes:
        - ocr-input:/usr/local/tomcat/ocr_input
- ocr-output:/usr/local/tomcat/ocr_output
      ...

   ocrmypdf:
      ...
      volumes:
- ocr-input:/usr/local/tomcat/ocr_input
- ocr-output:/usr/local/tomcat/ocr_output
   ...
volumes:
   ...
  ocr-input:
external: true
ocr-output:
external: true
...

 bin/ocrmypdf.sh

#!/bin/bash

INPUT_DIR=/usr/local/tomcat/ocr_input
OUTPUT_DIR=/usr/local/tomcat/ocr_output

# ocrmypdf hostname
OCRMYPDF_SERVER="ocrmypdf"

# identify parameters, input and output file
array=( "$@" )
len=${#array[@]}
ARGS=${array[@]:0:$len-2}

LAST_ARGS="${@: -2}"
INPUT_FILE_PARAM=`echo "$LAST_ARGS" | cut -d ' ' -f 1`
OUTPUT_FILE_PARAM=`echo "$LAST_ARGS" | cut -d ' ' -f 2`

# extract filenames
INPUT_FILE=$(basename "$INPUT_FILE_PARAM")
OUTPUT_FILE=$(basename "$OUTPUT_FILE_PARAM")

# SSH parameters
SCP=cp
SSH=ssh
USER=root

# copy original pdf to ocrmypdf server
$SCP $INPUT_FILE_PARAM $INPUT_DIR

# execute ocrmypdf program
$SSH $USER@$OCRMYPDF_SERVER "/usr/bin/ocr.sh $ARGS $INPUT_DIR/$INPUT_FILE $OUTPUT_DIR/$OUTPUT_FILE"

# copy transformed pdf back to alfresco path
$SCP $OUTPUT_DIR/$OUTPUT_FILE $OUTPUT_FILE_PARAM

# remove temporal files
rm -f $INPUT_DIR/$INPUT_FILE $OUTPUT_DIR/$OUTPUT_FILE

After the above changes I was able to successfully run OCR with Alfresco 6.1. 

As we are running our Alfresco instance on Kubernetes and using HELM deployment, I need to configure the  volumes in values.yaml file but I am not sure how to configure the volumes in values.yaml file. Any one has any idea on how we can make similar configuration in kubernetes.

Any help apprecaited. 

- Sriram

1 ACCEPTED ANSWER

OCR EXtract is an action which we assign to a folder as rule. Could you please check if the "OCR Extract" action is availble under actions in Folder rule?

If you are not seeing the action in the folder rule then the "simple-ocr-repo-2.3.1.jar" is either not properly installed in alfresco repository or look out for any exceptions around it.

- Sriram

View answer in original post

4 REPLIES 4

guilhermekellin
Champ in-the-making
Champ in-the-making

Hello, I'm having trouble configuring ocrmypdf in Alfresco. I am using the "alfresco-content-repository-community: 6.2.0-ga" version. After I follow the setup instructions, the option to configure the OCR action is not displayed in Alfresco. Would you help me? Follow the link for the project I'm running.

https://github.com/guilhermekelling/ocr.git

Thank you,

Guilherme Kelling

OCR EXtract is an action which we assign to a folder as rule. Could you please check if the "OCR Extract" action is availble under actions in Folder rule?

If you are not seeing the action in the folder rule then the "simple-ocr-repo-2.3.1.jar" is either not properly installed in alfresco repository or look out for any exceptions around it.

- Sriram

Good morning Sriram,

You were right, the file "imple-ocr-repo-2.3.1.jar" was not in the right local. After adjusting the configuration, the option was displayed in Alfresco.

Thank you for your help.

Hi @guilhermekellin,

Great that @SriramG was able to help you resolve your problem - thanks for reporting back. I've marked this as solved.

Kind regards,  

Digital Community Manager, Alfresco Software.
Problem solved? Click Accept as Solution!