cancel
Showing results for 
Search instead for 
Did you mean: 

simple-ocr integration with alfresco community 5.2

anuradha1
Star Contributor
Star Contributor

Hi,

I am woking with alfresco community 5.2 and now my client need to apply ocr functionality into alfresco. So, i tried to do that using simple ocr with pdfsandwich. This now working fine. But i need acurate the quality using tesseract attributes such as resolution and rgb. 

I changed alfresco-global.properties file like below.

ocr.command=/usr/bin/pdfsandwich
ocr.output.verbose=true
ocr.output.file.prefix.command=-o

ocr.extra.commands=-resolution 600 -rgb -verbose -lang spa+eng+fra

Then i tried to ocr through alfresco then it gives an error like below.

java.lang.RuntimeException: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 09220022 Failed to perform OCR transformation:
Execution result:
   os:         Linux
   command:    /usr/bin/pdfsandwich -resolution 600 -rgb -verbose -lang spa+eng+fra /home/administrator/alfresco-dms/tomcat/temp/Alfresco/OCRTransformWorker_source_8261423176645436134.pdf -o /home/administrator/alfresco-dms/tomcat/temp/Alfresco/OCRTransformWorker_source_8261423176645436134_ocr.pdf
   succeeded:  false
   exit code:  2
   out:        pdfsandwich version 0.1.7
Checking for convert:
convert -version
Version: ImageMagick 7.0.5-2 Q16 x86_64 2017-04-04 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Featur
   err:        tesseract: /home/administrator/alfresco-dms/common/lib/libtiff.so.5: no version information available (required by /usr/lib/liblept.so.5)
tesseract 3.04.01
 leptonica-1.73
  libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.56 : libtiff 4.
at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183)
at es.keensoft.alfresco.ocr.OCRExtractAction.executeImpl(OCRExtractAction.java:119)
at org.alfresco.repo.action.executer.ActionExecuterAbstractBase.execute(ActionExecuterAbstractBase.java:273)
at org.alfresco.repo.action.ActionServiceImpl.directActionExecution(ActionServiceImpl.java:856)
at org.alfresco.repo.action.executer.CompositeActionExecuter.executeImpl(CompositeActionExecuter.java:73)
at org.alfresco.repo.action.executer.ActionExecuterAbstractBase.execute(ActionExecuterAbstractBase.java:273)
at org.alfresco.repo.action.ActionServiceImpl.directActionExecution(ActionServiceImpl.java:856)
at org.alfresco.repo.action.ActionServiceImpl.executeActionImpl(ActionServiceImpl.java:757)
at org.alfresco.repo.action.ActionServiceImpl.executeAction(ActionServiceImpl.java:581)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.alfresco.repo.security.permissions.impl.AlwaysProceedMethodInterceptor.invoke(AlwaysProceedMethodInterceptor.java:41)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke(ExceptionTranslatorMethodInterceptor.java:53)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.alfresco.repo.audit.AuditMethodInterceptor.invoke(AuditMethodInterceptor.java:166)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy44.executeAction(Unknown Source)
at org.alfresco.repo.rule.RuleServiceImpl.executeAction(RuleServiceImpl.java:1278)
at org.alfresco.repo.rule.RuleServiceImpl.executeRule(RuleServiceImpl.java:1272)
at org.alfresco.repo.rule.RuleServiceImpl.executePendingRuleImpl(RuleServiceImpl.java:1218)
at org.alfresco.repo.rule.RuleServiceImpl.executePendingRule(RuleServiceImpl.java:1161)
at org.alfresco.repo.rule.RuleServiceImpl.executePendingRulesImpl(RuleServiceImpl.java:1127)
at org.alfresco.repo.rule.RuleServiceImpl.executePendingRules(RuleServiceImpl.java:1100)
at org.alfresco.repo.rule.RuleTransactionListener.beforeCommit(RuleTransactionListener.java:64)
at org.alfresco.util.transaction.TransactionSupportUtil$TransactionSynchronizationImpl.doBeforeCommit(TransactionSupportUtil.java:535)
at org.alfresco.util.transaction.TransactionSupportUtil$TransactionSynchronizationImpl.doBeforeCommit(TransactionSupportUtil.java:514)
at org.alfresco.util.transaction.TransactionSupportUtil$TransactionSynchronizationImpl.beforeCommit(TransactionSupportUtil.java:479)
at org.springframework.transaction.support.TransactionSynchronizationUtils.triggerBeforeCommit(TransactionSynchronizationUtils.java:95)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.triggerBeforeCommit(AbstractPlatformTransactionManager.java:925)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:738)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:724)
at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:475)
at org.alfresco.util.transaction.SpringAwareUserTransaction.commit(SpringAwareUserTransaction.java:482)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:486)
at org.alfresco.repo.web.scripts.RepositoryContainer.transactionedExecute(RepositoryContainer.java:587)
at org.alfresco.repo.web.scripts.RepositoryContainer.transactionedExecuteAs(RepositoryContainer.java:656)
at org.alfresco.repo.web.scripts.RepositoryContainer.executeScriptInternal(RepositoryContainer.java:428)
at org.alfresco.repo.web.scripts.RepositoryContainer.executeScript(RepositoryContainer.java:308)
at org.springframework.extensions.webscripts.AbstractRuntime.executeScript(AbstractRuntime.java:399)
at org.springframework.extensions.webscripts.AbstractRuntime.executeScript(AbstractRuntime.java:210)
at org.springframework.extensions.webscripts.servlet.WebScriptServlet.service(WebScriptServlet.java:132)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.alfresco.module.aosmodule.service.ContextRootFilter.doFilter(ContextRootFilter.java:93)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.alfresco.web.app.servlet.GlobalLocalizationFilter.doFilter(GlobalLocalizationFilter.java:68)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:218)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:506)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:962)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:445)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1115)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2486)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)

But when i try the below failed command in terminal, it works fine & generated quality pdf as expected.

/usr/bin/pdfsandwich -resolution 600 -rgb -verbose -lang spa+eng+fra /home/administrator/alfresco-dms/tomcat/temp/Alfresco/OCRTransformWorker_source_8261423176645436134.pdf -o /home/administrator/alfresco-dms/tomcat/temp/Alfresco/OCRTransformWorker_source_8261423176645436134_ocr.pdf

But if i removed this -rgb attribute from alfresco-global.properties, it works but -resolution also not affected. I am wondering why is this happen. I need to know why these extra commands are not working which i added into alfresco-global.properties file.

If any other way to achieve also accepted. Plz help me

Thanks.

2 REPLIES 2

mitpatoliya
Star Collaborator
Star Collaborator

As per the documentation of PDFSandwich they have highlighted that -rgb needs to be used with caution.

http://www.tobias-elze.de/pdfsandwich/

 -rgb 		 use RGB color space for images (default: black and white);
		  use with care: causes problems with some color spaces

Now as you have already mentioned that it is working fine if you are running from the command line then it must be something wrong with alfresco configuration. I have found a similar issue on alfresco gitHub maybe you find some help there.

https://github.com/keensoft/alfresco-simple-ocr/issues/47

 

@mitpatoliya wrote:

As per the documentation of PDFSandwich they have highlighted that -rgb needs to be used with caution.

http://www.tobias-elze.de/pdfsandwich/

 -rgb 		 use RGB color space for images (default: black and white);
		  use with care: causes problems with some color spaces

Now as you have already mentioned that it is working fine if you are running from the command line then it must be something wrong with alfresco configuration. I have found a similar issue on alfresco gitHub maybe you find some help there.

https://github.com/keensoft/alfresco-simple-ocr/issues/47




Thanks for the reply. I added below properties to my alfresco-global.properties file. 

img.root=/usr
img.dyn=${img.root}/lib
img.exe=${img.root}/bin/convert
img.gslib=/usr/share/ghostscript/9.26/lib

 But same thing happened. OCR is happening trough alfresco but -resolution 600 command is not working. No errors also. Can u plz help me further to solve this problem?