cancel
Showing results for 
Search instead for 
Did you mean: 

Integration Alfresco with OCR

andrycolt2181
Champ in-the-making
Champ in-the-making
Hello all,
Could somebody help me to solve the created issue.
I have installed alfresco community edition 3.4.d. And want to integrate with OCR system tessaract or Cuneiform.
Alfresco is instaled on the Ubuntu 10.04 Server.

I created the file ocrtiff-transform-context.xml–> /opt/alfresco-3.4.d/tomcat/shared/classes/alfresco/extension


<?xml version='1.0' encoding='UTF-8'?>
        <!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

        <beans>
            <bean id="transformer.worker.ocr.tiff" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">

                <property name="mimetypeService">
                    <ref bean="mimetypeService" />
                </property>

                  <property name="checkCommand">
                     <bean class="org.alfresco.util.exec.RuntimeExec">
                        <property name="commandsAndArguments">
                            <map>
                                <entry key=".*">
                                    <list>
        <!–                            <value>tesseract</value> –>
                                        <value>/opt/alfresco-3.4.d/ocr</value>
                                    </list>
                                </entry>
                            </map>
                        </property>
                        <property name="errorCodes">
                           <value>2</value>
                        </property>
                     </bean>
                  </property>

                  <property name="transformCommand">
                     <bean class="org.alfresco.util.exec.RuntimeExec">
                        <property name="commandsAndArguments">
                            <map>
                                <entry key=".*">
                                    <list>
        <!–                            <value>tesseract</value>
                                        <value>${source}</value>
                                        <value>${target}</value>
                                        <value>-l</value>
                                        <value>eng</value> –>
                                        <value>/opt/alfresco-3.4.d/ocr</value>
                                        <value>${source}</value>
                                        <value>${target}</value>
                                    </list>
                                </entry>
                            </map>
                        </property>
                        <property name="errorCodes">
                           <value>1,2</value>
                        </property>
                     </bean>
                  </property>

                  <property name="explicitTransformations">
                     <list>
                        <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
                            <property name="sourceMimetype"><value>image/tiff</value></property>
                            <property name="targetMimetype"><value>text/plain</value></property>
                        </bean>
                     </list>
                  </property>
            </bean>

            <bean id="transformer.ocr.tiff" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
                <property name="worker">
                    <ref bean="transformer.worker.ocr.tiff" />
                </property>
            </bean>
        </beans>


and the script:

#!/bin/bash
    # save arguments to variables
    SOURCE=$1
    TARGET=$2
    TMPDIR=/tmp
    FILENAME=`basename $SOURCE`
    OCRFILE=$FILENAME.tif

    # to see what happens
    echo `date` "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log

    cp -f $SOURCE $TMPDIR/$OCRFILE






     # call tesseract and redirect output to $TARGET
    tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng  2>> /tmp/ocrtransform.err
    rm -f $TMPDIR/$OCRFILE


Error that appear consist in :


Jan 14, 2012 4:37:01 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive wcmqs.war
Jan 14, 2012 4:37:01 AM org.apache.catalina.loader.WebappClassLoader findResourceInternal
INFO: Illegal access: this web application instance has been stopped already.  Could not load ehcache-version.properties.  The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact.
Exception in thread "net.sf.ehcache.CacheManager@6f2a75" java.lang.NullPointerException
        at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:204)
        at net.sf.ehcache.util.UpdateChecker.checkForUpdate(UpdateChecker.java:62)
        at net.sf.ehcache.util.UpdateChecker.run(UpdateChecker.java:50)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)
WARN : org.apache.myfaces.shared_impl.util.LocaleUtils - Locale name in faces-config.xml null or empty, setting locale to default locale : en_US
WARN : org.springframework.beans.GenericTypeAwarePropertyDescriptor - Invalid JavaBean property 'baseUrl' being accessed! Ambiguous write methods found next to actually used [public void org.alfresco.wcm.client.impl.WebScriptCallerImpl.setBaseUrl(java.lang.String) throws java.net.URISyntaxException]: [public void org.alfresco.wcm.client.impl.WebScriptCallerImpl.setBaseUrl(java.net.URI)]
Jan 14, 2012 4:37:07 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive share.war
04:37:14,262  INFO  [extensions.webscripts.DeclarativeRegistry] Registered 309 Web Scripts (+0 failed), 319 URLs
04:37:14,263  INFO  [extensions.webscripts.DeclarativeRegistry] Registered 8 Package Description Documents (+0 failed)
04:37:14,263  INFO  [extensions.webscripts.DeclarativeRegistry] Registered 0 Schema Description Documents (+0 failed)
04:37:14,400  INFO  [extensions.webscripts.AbstractRuntimeContainer] Initialised Spring Surf Container Web Script Container (in 2571.7957ms)
04:37:14,443  INFO  [extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
04:37:14,531  INFO  [extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
04:37:14,777  INFO  [extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
04:37:14,787  INFO  [extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
04:37:15,098  INFO  [extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
04:37:15,103  INFO  [extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
Jan 14, 2012 4:37:15 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive awe.war
Jan 14, 2012 4:37:19 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory ROOT
Jan 14, 2012 4:37:19 AM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
Jan 14, 2012 4:37:20 AM org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
Jan 14, 2012 4:37:20 AM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/71  config=null
Jan 14, 2012 4:37:20 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 26222 ms
WARN : org.alfresco.wcm.client.util.impl.GuestSessionFactoryImpl - WQS unable to connect to repository: Not Found
WARN : org.alfresco.wcm.client.util.impl.GuestSessionFactoryImpl - WQS unable to connect to repository: Not Found
WARN : org.alfresco.wcm.client.util.impl.GuestSessionFactoryImpl - WQS unable to connect to repository: Not Found
WARN : org.alfresco.wcm.client.util.impl.GuestSessionFactoryImpl - WQS unable to connect to repository: Not Found
WARN : org.alfresco.wcm.client.util.impl.GuestSessionFactoryImpl - WQS unable to connect to repository: Not Found
WARN : org.alfresco.wcm.client.util.impl.GuestSessionFactoryImpl - WQS unable to connect to repository: Not Found
WARN : org.alfresco.wcm.client.util.impl.GuestSessionFactoryImpl - WQS unable to connect to repository: Not Found
WARN : org.alfresco.wcm.client.util.impl.GuestSessionFactoryImpl - WQS unable to connect to repository: Not Found



Could somebody help me to solve this issue!!
Thank you in advance
1 REPLY 1

wmay
Champ in-the-making
Champ in-the-making
Hi,

We have implemented an OCR server integrated with Alfresco, which can be used as transformer or via Javascript and Java. It runs on  a separate OCR server and supports Abbyy and Google OCR. for more informaiton see here - https://forums.alfresco.com/en/viewtopic.php?f=33&t=44739