02-21-2020 10:41 AM
Hello,
I have installed the addon alfresco-simple-ocr by keensoft and all its dependencies (ocrmypdf, tesseract, imagemagick). When I tried to OCR a pdf it throws me this error
Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 01190024 Failed to perform OCR transformation: Execution result: os: Mac OS X command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra --output-type pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757.pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 5, in <module> from ocrmypdf.__main__ import run File "/usr/local/Cellar/ocrmypdf/9.5.0/libexec/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 18, in <module> at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:196) at es.keensoft.alfresco.ocr.OCRExtractAction.access$400(OCRExtractAction.java:39) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:177) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:174) at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464) at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:182) at es.keensoft.alfresco.ocr.OCRExtractAction.access$300(OCRExtractAction.java:39) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask$1$1.doWork(OCRExtractAction.java:159) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask$1$1.doWork(OCRExtractAction.java:156) at org.alfresco.repo.tenant.TenantUtil.runAsWork(TenantUtil.java:126) at org.alfresco.repo.tenant.TenantUtil.runAsTenant(TenantUtil.java:95) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask$1.doWork(OCRExtractAction.java:155) at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:588) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:152) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 01190024 Failed to perform OCR transformation: Execution result: os: Mac OS X command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra --output-type pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757.pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 5, in <module> from ocrmypdf.__main__ import run File "/usr/local/Cellar/ocrmypdf/9.5.0/libexec/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 18, in <module> at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:86) at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:194) ... 16 more Caused by: org.alfresco.service.cmr.repository.ContentIOException: 01190024 Failed to perform OCR transformation: Execution result: os: Mac OS X command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra --output-type pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757.pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 5, in <module> from ocrmypdf.__main__ import run File "/usr/local/Cellar/ocrmypdf/9.5.0/libexec/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 18, in <module> at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79) ... 17 more
If I tried to run the next command manually in my terminal it works without any issue
/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra --output-type pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757.pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757_ocr.pdf
does this addon works on macOS?
The log above point to this file
/usr/local/Cellar/ocrmypdf/9.5.0/libexec/lib/python3.7/site-packages/ocrmypdf/__init__.py
this is the file content
# © 2017 James R. Barlow: github.com/jbarlow83 # # This file is part of OCRmyPDF. # # OCRmyPDF is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # OCRmyPDF is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with OCRmyPDF. If not, see <http://www.gnu.org/licenses/>. from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo from ._version import PROGRAM_NAME, __version__ from .api import Verbosity, configure_logging, ocr from .exceptions import ( BadArgsError, DpiError, EncryptedPdfError, ExitCode, ExitCodeException, InputFileError, MissingDependencyError, OutputFileAccessError, PdfMergeFailedError, PriorOcrFoundError, SubprocessOutputError, TesseractConfigError, UnsupportedImageFormatError, )
Thanks
02-24-2020 09:53 AM
I have fixed this issue, I had to add the correct image magick properties in alfresco-global.properties
### ImageMagick Config ### img.root=/usr/local/Cellar/imagemagick/7.0.9-23
# ----> I had this property wrong, 'img.dyn' <---- img.dyn=${img.root}/lib img.exe=${img.root}/bin/convert img.gslib=/usr/local/Cellar/ghostscript/9.50/lib #img.coders=${img.root}/modules/coders #img.config=${img.root}/config #GS executable ghostscript.exe=gs #Tesseract executable tesseract.exe=tesseract
# OCRmyPDF
# running 'which ocrmypdf' returns '/usr/local/bin/ocrmypdf', I think this value could be used as well
ocr.command=/usr/local/Cellar/ocrmypdf/9.5.0/bin/ocrmypdf
ocr.output.verbose=true
ocr.output.file.prefix.command=
ocr.extra.commands=--verbose 1 --force-ocr -l spa+eng+fra --output-type pdf
ocr.server.os=linux
02-21-2020 10:44 AM
I'd suggest you to use the dockerized version that can be produced with:
https://github.com/Alfresco/alfresco-docker-installer
02-21-2020 11:15 AM
Hello Angel Borroy,
I'm using ACS 5.2.6. Enterprise.
02-24-2020 09:53 AM
I have fixed this issue, I had to add the correct image magick properties in alfresco-global.properties
### ImageMagick Config ### img.root=/usr/local/Cellar/imagemagick/7.0.9-23
# ----> I had this property wrong, 'img.dyn' <---- img.dyn=${img.root}/lib img.exe=${img.root}/bin/convert img.gslib=/usr/local/Cellar/ghostscript/9.50/lib #img.coders=${img.root}/modules/coders #img.config=${img.root}/config #GS executable ghostscript.exe=gs #Tesseract executable tesseract.exe=tesseract
# OCRmyPDF
# running 'which ocrmypdf' returns '/usr/local/bin/ocrmypdf', I think this value could be used as well
ocr.command=/usr/local/Cellar/ocrmypdf/9.5.0/bin/ocrmypdf
ocr.output.verbose=true
ocr.output.file.prefix.command=
ocr.extra.commands=--verbose 1 --force-ocr -l spa+eng+fra --output-type pdf
ocr.server.os=linux
Explore our Alfresco products with the links below. Use labels to filter content by product module.