
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎06-23-2020 06:04 PM
With the approach suggested by Fedorow in https://hub.alfresco.com/t5/alfresco-content-services-forum/quot-ocr-extract-quot-action-doesn-t-wor..., I was able to make OCR work with Alfresco 6.1.0 and Docker.
I updatedocr_input and /ocr_output to use /usr/local/tomcat/ocr_input and /usr/local/tomcat/ocr_out so that alfresco container can access these folders without any access issues.
Thanks Fedorow
Below are the changes done to docker-compose.yml and ocrmypdf.sh
docker-compose.yml
...
services:
alfresco:
...
volumes:
- ocr-input:/usr/local/tomcat/ocr_input
- ocr-output:/usr/local/tomcat/ocr_output
...
ocrmypdf:
...
volumes:
- ocr-input:/usr/local/tomcat/ocr_input
- ocr-output:/usr/local/tomcat/ocr_output
...
volumes:
...
ocr-input:
external: true
ocr-output:
external: true
...
bin/ocrmypdf.sh
#!/bin/bash
INPUT_DIR=/usr/local/tomcat/ocr_input
OUTPUT_DIR=/usr/local/tomcat/ocr_output
# ocrmypdf hostname
OCRMYPDF_SERVER="ocrmypdf"
# identify parameters, input and output file
array=( "$@" )
len=${#array[@]}
ARGS=${array[@]:0:$len-2}
LAST_ARGS="${@: -2}"
INPUT_FILE_PARAM=`echo "$LAST_ARGS" | cut -d ' ' -f 1`
OUTPUT_FILE_PARAM=`echo "$LAST_ARGS" | cut -d ' ' -f 2`
# extract filenames
INPUT_FILE=$(basename "$INPUT_FILE_PARAM")
OUTPUT_FILE=$(basename "$OUTPUT_FILE_PARAM")
# SSH parameters
SCP=cp
SSH=ssh
USER=root
# copy original pdf to ocrmypdf server
$SCP $INPUT_FILE_PARAM $INPUT_DIR
# execute ocrmypdf program
$SSH $USER@$OCRMYPDF_SERVER "/usr/bin/ocr.sh $ARGS $INPUT_DIR/$INPUT_FILE $OUTPUT_DIR/$OUTPUT_FILE"
# copy transformed pdf back to alfresco path
$SCP $OUTPUT_DIR/$OUTPUT_FILE $OUTPUT_FILE_PARAM
# remove temporal files
rm -f $INPUT_DIR/$INPUT_FILE $OUTPUT_DIR/$OUTPUT_FILE
After the above changes I was able to successfully run OCR with Alfresco 6.1.
As we are running our Alfresco instance on Kubernetes and using HELM deployment, I need to configure the volumes in values.yaml file but I am not sure how to configure the volumes in values.yaml file. Any one has any idea on how we can make similar configuration in kubernetes.
Any help apprecaited.
- Sriram
- Labels:
-
Alfresco Content Services

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-15-2020 10:46 AM
OCR EXtract is an action which we assign to a folder as rule. Could you please check if the "OCR Extract" action is availble under actions in Folder rule?
If you are not seeing the action in the folder rule then the "simple-ocr-repo-2.3.1.jar" is either not properly installed in alfresco repository or look out for any exceptions around it.
- Sriram
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-14-2020 10:34 AM
Hello, I'm having trouble configuring ocrmypdf in Alfresco. I am using the "alfresco-content-repository-community: 6.2.0-ga" version. After I follow the setup instructions, the option to configure the OCR action is not displayed in Alfresco. Would you help me? Follow the link for the project I'm running.
https://github.com/guilhermekelling/ocr.git
Thank you,
Guilherme Kelling

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-15-2020 10:46 AM
OCR EXtract is an action which we assign to a folder as rule. Could you please check if the "OCR Extract" action is availble under actions in Folder rule?
If you are not seeing the action in the folder rule then the "simple-ocr-repo-2.3.1.jar" is either not properly installed in alfresco repository or look out for any exceptions around it.
- Sriram
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-16-2020 07:56 AM
Good morning Sriram,
You were right, the file "imple-ocr-repo-2.3.1.jar" was not in the right local. After adjusting the configuration, the option was displayed in Alfresco.
Thank you for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-16-2020 09:33 AM
Hi @guilhermekellin,
Great that @SriramG was able to help you resolve your problem - thanks for reporting back. I've marked this as solved.
Kind regards,
Problem solved? Click Accept as Solution!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Dear Sir,
I am currently working on integrating OCR functionality into Alfresco 6.2, running on a Windows Server. I have successfully installed the following dependencies:
Tesseract
Ghostscript
OCRmyPDF
I have placed the required JAR files:
simple-ocr-repo-2.3.1.jar
simple-ocr-share-2.3.1.jar
into the appropriate platform and share directories of the Alfresco installation.
The following properties have been added to the alfresco-global.properties file:
ocr.command=C:/Users/admin/AppData/Roaming/Python/Python313/Scripts/ocrmypdf.exe ocr.output.verbose=trueocr.output.file.prefix.command=ocr.extra.commands=--verbose 1 --force-ocr --deskew -l eng+spa+fra ocr.server.os=windowsHowever, when I attempt to use the OCR feature from the document details section, I encounter the following error:Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Invalid uri '${ocr.url}language=--verbose 1 --force-ocr --deskew -l eng+spa+fra&source=H%3A%5CDMS62%5Ctomcat%5Ctemp%5CAlfresco%5COCRTransformWorker_source_8194440309054693312.pdf&target=H%3A%5CDMS62%5Ctomcat%5Ctemp%5CAlfresco%5COCRTransformWorker_source_8194440309054693312_ocr.pdf': incorrect path at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183) ... Caused by: java.lang.IllegalArgumentException: Invalid uri '${ocr.url}language=--verbose 1 --force-ocr --deskew -l eng+spa+fra&source=H%3A%5CDMS62%5Ctomcat%5Ctemp%5CAlfresco%5COCRTransformWorker_source_8194440309054693312.pdf&target=H%3A%5CDMS62%5Ctomcat%5Ctemp%5CAlfresco%5COCRTransformWorker_source_8194440309054693312_ocr.pdf': incorrect pathI would appreciate your assistance in identifying the cause and guiding me toward a resolution. Please let me know if you require any further logs, configuration files, or additional details.
Thank you in advance for your support.
