06-23-2020 06:04 PM
With the approach suggested by Fedorow in https://hub.alfresco.com/t5/alfresco-content-services-forum/quot-ocr-extract-quot-action-doesn-t-wor..., I was able to make OCR work with Alfresco 6.1.0 and Docker.
I updatedocr_input and /ocr_output to use /usr/local/tomcat/ocr_input and /usr/local/tomcat/ocr_out so that alfresco container can access these folders without any access issues.
Thanks Fedorow
Below are the changes done to docker-compose.yml and ocrmypdf.sh
docker-compose.yml
...
services:
alfresco:
...
volumes:
- ocr-input:/usr/local/tomcat/ocr_input
- ocr-output:/usr/local/tomcat/ocr_output
...
ocrmypdf:
...
volumes:
- ocr-input:/usr/local/tomcat/ocr_input
- ocr-output:/usr/local/tomcat/ocr_output
...
volumes:
...
ocr-input:
external: true
ocr-output:
external: true
...
bin/ocrmypdf.sh
#!/bin/bash
INPUT_DIR=/usr/local/tomcat/ocr_input
OUTPUT_DIR=/usr/local/tomcat/ocr_output
# ocrmypdf hostname
OCRMYPDF_SERVER="ocrmypdf"
# identify parameters, input and output file
array=( "$@" )
len=${#array[@]}
ARGS=${array[@]:0:$len-2}
LAST_ARGS="${@: -2}"
INPUT_FILE_PARAM=`echo "$LAST_ARGS" | cut -d ' ' -f 1`
OUTPUT_FILE_PARAM=`echo "$LAST_ARGS" | cut -d ' ' -f 2`
# extract filenames
INPUT_FILE=$(basename "$INPUT_FILE_PARAM")
OUTPUT_FILE=$(basename "$OUTPUT_FILE_PARAM")
# SSH parameters
SCP=cp
SSH=ssh
USER=root
# copy original pdf to ocrmypdf server
$SCP $INPUT_FILE_PARAM $INPUT_DIR
# execute ocrmypdf program
$SSH $USER@$OCRMYPDF_SERVER "/usr/bin/ocr.sh $ARGS $INPUT_DIR/$INPUT_FILE $OUTPUT_DIR/$OUTPUT_FILE"
# copy transformed pdf back to alfresco path
$SCP $OUTPUT_DIR/$OUTPUT_FILE $OUTPUT_FILE_PARAM
# remove temporal files
rm -f $INPUT_DIR/$INPUT_FILE $OUTPUT_DIR/$OUTPUT_FILE
After the above changes I was able to successfully run OCR with Alfresco 6.1.
As we are running our Alfresco instance on Kubernetes and using HELM deployment, I need to configure the volumes in values.yaml file but I am not sure how to configure the volumes in values.yaml file. Any one has any idea on how we can make similar configuration in kubernetes.
Any help apprecaited.
- Sriram
07-15-2020 10:46 AM
OCR EXtract is an action which we assign to a folder as rule. Could you please check if the "OCR Extract" action is availble under actions in Folder rule?
If you are not seeing the action in the folder rule then the "simple-ocr-repo-2.3.1.jar" is either not properly installed in alfresco repository or look out for any exceptions around it.
- Sriram
07-14-2020 10:34 AM
Hello, I'm having trouble configuring ocrmypdf in Alfresco. I am using the "alfresco-content-repository-community: 6.2.0-ga" version. After I follow the setup instructions, the option to configure the OCR action is not displayed in Alfresco. Would you help me? Follow the link for the project I'm running.
https://github.com/guilhermekelling/ocr.git
Thank you,
Guilherme Kelling
07-15-2020 10:46 AM
OCR EXtract is an action which we assign to a folder as rule. Could you please check if the "OCR Extract" action is availble under actions in Folder rule?
If you are not seeing the action in the folder rule then the "simple-ocr-repo-2.3.1.jar" is either not properly installed in alfresco repository or look out for any exceptions around it.
- Sriram
07-16-2020 07:56 AM
Good morning Sriram,
You were right, the file "imple-ocr-repo-2.3.1.jar" was not in the right local. After adjusting the configuration, the option was displayed in Alfresco.
Thank you for your help.
07-16-2020 09:33 AM
Hi @guilhermekellin,
Great that @SriramG was able to help you resolve your problem - thanks for reporting back. I've marked this as solved.
Kind regards,
Explore our Alfresco products with the links below. Use labels to filter content by product module.