cancel
Showing results for 
Search instead for 
Did you mean: 

eCopy and Kofax?

why1525
Champ in-the-making
Champ in-the-making
Dear all,

I am researching Scanning n OCR solution for a company which is using Alfresco now.

There is a question i need to know. Is it any Scanning and OCR software can integrated with Alfrsco easily? or only eCopy and Kofax solution able to do this?

I need this answer very much because the due date is near.

Thanks you.
25 REPLIES 25

krisapong
Champ in-the-making
Champ in-the-making
Hello muhammad,

#1 For Full Page OCR you could use Abbyy or Intelliant OCR or others, then you got PDF/a for Searchable PDF. These could allow alfresco do the full text search.

#2 For the aspect creation XML

AspectPath = C:\Alfresco\tomcat\shared\classes\alfresco\\extension\
FileWebConfig = web-client-config-custom.xml
FileCustomModelContext = custom-model-context.xml
AspectFile = <aspectname>Model.xml

Please kindly study at http://wiki.alfresco.com/wiki/Displaying_Custom_Metadata

#3 Create Rule using the previous aspect

#4 Using Alfresco WebService method to send the text to each aspect.
http://wiki.alfresco.com/wiki/Alfresco_Content_Management_Web_Services


#5 Configure the Alfresco Search to search thru these aspect. <using the XML configuration>

I think you need to take a lot of time to study and do it all. that's why the plug-in module will help you. As my opinion, for the enterprise purposed, nothing is free.


Thank you

dranakan
Champ on-the-rise
Champ on-the-rise
Hello,

I search a software able to convert a TIFF to PDF (txt) and TIFF to TXT working in commandline under Linux and Windows.
I can do that with intelliant, but it doesn't work on Linux (tried with WineHQ).

Somebody knows an OCR working in command line under Linux (ok with Wine) and Windows to transfrom TIFF->PDF and TIFF->TXT (between 50 and 200$) ?

Thanks.

jlabuelo
Champ on-the-rise
Champ on-the-rise
Hello Dranakan

I have reviewed your post about OCR a tif document to move it to PDF and saw that you got it to work with Intelliant. I am trying to install Intelliant in my Alfresco system (Alfresco 3.0).


Could you please explain me how you got it to work? I have followed the instructions of the user guide of the Alfresco Wiki and Intelliant Tutorial but I dont get it to work.

I have downloaded the Intelliant-Alfresco bundle and installed it in my Windows XP to check if worked fine, but I have two erros:
a) I have configured Alfresco as it is mentioned in the user guide of , but when I launch Alfresco 3 tomcat server and try to go to the log in page, I get the HTTP 404 Resource not available error message.

b) Also if I try to use the Intelliant OCR software using the command line "cmd.exe /k ocr.exe" I get this error message window: "Application does not find MSVCP71.dll. Please try to reinstall again" I have reinstalled the software, but does not work either.

Any help on how I can install Intelliant to make a couple of tests before purchasing the license?

Thanks a lot

dranakan
Champ on-the-rise
Champ on-the-rise
Hello jlabuelo,

All that I have done was to create a method in Java able to use intelliant. (after my goal was to add it in a custom action). I have not use the connector Intelliant - Alfresco.

Use intelliant-ocr-1.1.exe that you can dowload on (http://intelliant.fr/downloads/intelliant-ocr-1.1.exe).

Try to use it : (put a tif in the directory, use cmd.exe)
ocr.exe file.tif
For your library : MSVCP71.dll, I hope that was because you have tried an old version of intelliant… if not, try to add MSVCP71.dll in C:\WINDOWS\system32 (it is just an idea…)

To use in Java code (in Alfresco, after create a custom action  using a method that extends ActionExecuterAbstractBase) http://wiki.alfresco.com/wiki/Custom_Actions

//Pass to ocr.exe
Process process = Runtime.getRuntime().exec("ocr " + file);
process.waitFor();
….

I hope that it can help you…

jlabuelo
Champ on-the-rise
Champ on-the-rise
Thanks a lot for the answer.. yes I got the comand Intelliant OCR to work now and to produce the expected pdf file.

Now I will take a look to the wiki link about Custom Actions you sent me to see how I can reproduce this in the java code of our customization…. I dont know yet how I will pass the tif node we generate in Alfresco to the Ocr.exe as we dont save it in any place of the C:\ drive, just in the alfresco repository.. and how to get the result and produce a new pdf node with it.

Would you have an example code that you can share with me to use it as a guide?

Thanks a lot again, it has been very usefull.

Regards

sumitweirminera
Champ in-the-making
Champ in-the-making
HI,

I am using  Alfresco Labs Stable in Windows, I downloaded intelliant-ocr-1.1.exe and installed. According to the manual I copied ocr-transformers-context to extension folder. When I restart Alfresco I am getting following error: Error creating bean with name 'transformer.Ocr.Tiff2Pdf' defined in file [C:\Alfresco\tomcat\shared\classes\alfresco\extension\ocr-transformers-context.xml].

Can some please assist

agey
Champ in-the-making
Champ in-the-making
Hi sumitweirminerals,

I had the same problem and solved it. I modified the file ocr-transformers-context.xml as follows:


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>
   <bean id="transformer.Ocr.Tiff2Pdf" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformer" parent="baseContentTransformer">
      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
               <map>
                  <entry key=".*">
                     <value>ocr.exe –about</value>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>
      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
               <map>
                  <entry key="Windows.*">
                     <value>
                        ocr.exe –replace –language es –pdf –output-file "${target}" "${source}"
                     </value>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>
      <property name="explicitTransformations">
         <list>
            
            <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" >
               <property name="sourceMimetype"><value>image/tiff</value></property>
               <property name="targetMimetype"><value>application/pdf</value></property>
            </bean>
<!–
            <bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey">
               <constructor-arg>
                  <value>image/tiff</value>
               </constructor-arg>
               <constructor-arg>
                  <value>application/pdf</value>
               </constructor-arg>
            </bean>
            –>
         </list>
      </property>
   </bean>
   <bean id="transformer.Ocr.Tiff2Rtf" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformer" parent="baseContentTransformer">
      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
               <map>
                  <entry key=".*">
                     <value>ocr.exe –about</value>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>
      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
               <map>
                  <entry key="Windows.*">
                     <value>
                        ocr.exe –replace –language es –rtf –output-file "${target}" "${source}"
                     </value>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>
      <property name="explicitTransformations">
         <list>
         
            <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" >
               <property name="sourceMimetype"><value>image/tiff</value></property>
               <property name="targetMimetype"><value>application/rtf</value></property>
            </bean>
            <!–
            <bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey">
               <constructor-arg>
                  <value>image/tiff</value>
               </constructor-arg>
               <constructor-arg>
                  <value>application/rtf</value>
               </constructor-arg>
            </bean>
            –>
         </list>
      </property>
   </bean>
   <bean id="transformer.Ocr.Tiff2Txt" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformer" parent="baseContentTransformer">
      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
               <map>
                  <entry key=".*">
                     <value>ocr.exe –about</value>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>
      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
               <map>
                  <entry key="Windows.*">
                     <value>
                        ocr.exe –replace –language es –ascii –output-file "${target}" "${source}"
                     </value>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>
      <property name="explicitTransformations">
         <list>
            <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" >
               <property name="sourceMimetype"><value>image/tiff</value></property>
               <property name="targetMimetype"><value>text/plain</value></property>
            </bean>
            <!–
            <bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey">
               <constructor-arg>
                  <value>image/tiff</value>
               </constructor-arg>
               <constructor-arg>
                  <value>text/plain</value>
               </constructor-arg>
            </bean>
            –>
         </list>
      </property>
   </bean>
   <bean id="transformer.Pdf2Tiff" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformer" parent="baseContentTransformer">
      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
               <map>
                  <entry key=".*">
                     <value>ocr.exe –about</value>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>
      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
               <map>
                  <entry key="Windows.*">
                     <value>
                        ocr.exe –replace –tiff –output-file "${target}" "${source}"
                     </value>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>
      <property name="explicitTransformations">
         <list>
         
            <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" >
               <property name="sourceMimetype"><value>application/pdf</value></property>
               <property name="targetMimetype"><value>image/tiff</value></property>
            </bean>
            <!–
            <bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey">
               <constructor-arg>
                  <value>application/pdf</value>
               </constructor-arg>
               <constructor-arg>
                  <value>image/tiff</value>
               </constructor-arg>
            </bean>
            –>
         </list>
      </property>
   </bean>
</beans>
   
I changed the bean ContentTransformerRegistry of the property explicitTransformations by ExplictTransformationDetails. The bean ExplictTransformationDetails is used by Alfresco in the configuration of other tranformations included in the installation.

I use this code in my configuration and works fine.

sumitweirminera
Champ in-the-making
Champ in-the-making
HI Agey,

Thanks a ton. Its working perfectly alright!!!

Regards

pescha
Champ in-the-making
Champ in-the-making
Did anyone find an open source solution for making searchable image pdf's?  I'm told that gscan2pdf using the tesseract engine can do this, but to port it into Alfresco…?

kvb
Champ in-the-making
Champ in-the-making
Hi there,

Does anyone know if intelliant ocr 1.1 is free? Or will it stop working after a set period?

If the latter is the case, are there any ather working open source / free OCR tools to integrate with Alfresco AND able of creating searchable pdfs?

Thanks for any help in advance!

Regards,

Kees.