Hyland Connect

dranakan · ‎02-17-2009

Hello,

I am evaluating different OCR to incorporate in Alfresco. The aim of these OCR is for me to get some fields from a paper (on a invoice for exemple). It would to generate a pdf and a other file with the value (name=bob, numberInvoice=23423, …). My softwares to tests are :
- Kofax
- eCopy
- Iris capture
- Adobe Capture

I'am looking for the cheapest. Have you got another OCR that you use in the same context ?

Now, I'am working with Adobe Capture, but I not able to extract the data in other file with the value (name=bob, numberInvoice=23423, …). Someone can explain me how to do ?

jhonabraham · ‎10-11-2009

Email
Print
Reprint
Magazine
Newsletters
Learn RSS
del.icio.us
My Yahoo
Digg
Newsvine
Blogger
Live Journal
StumbleUpon
Reddit
facebook
OCR or e-Invoicing—Making the Right Choice for your Organization
By Thayer Stewart, Special Contributor – Supply Chain Management Review, 8/7/2009 8:19:00 AM
The global supply chain demands buyers and suppliers invest in technologies engineered to provide uninterrupted delivery of goods and services. One of the largest impediments to efficient and profitable time to market is the delay associated with invoice receipt and payment. Invoice capture solutions have emerged as vital components of the overall procurement to pay (P2P) process, though not all are created equal.

When selecting an invoice data capture solution with the purpose of streamlining and optimizing the P2P and accounts payable processes, organizations often consider a number of possibilities, including optical character recognition (OCR) and e-Invoicing. Although OCR does offer some benefit to accounts payable departments, e-Invoicing stands as the clear winner when comparing accuracy, cost-effectiveness and overall return on investment.

When making such an important decision, organizations should ask several questions to help determine the solution that best meets their needs. What are the core differences between the technologies? How accurate are the solutions? When will I achieve ROI? Will they decrease paper consumption? How will my suppliers react? Are there any other viable alternatives?

OCR, or optical character recognition, is a technology that’s been around for decades. The basic premise of OCR is that information on paper documents can be extracted and automatically entered into an organization’s A/P workflow or ERP system, eliminating the need for data entry staff. OCR has been successfully applied to many functions that involve standard forms, such as medical claims and mortgage applications, however, it has had limited success with non-standard, variable documents such as invoices. Data errors are common and exception handling is a significant issue that requires ongoing manual intervention.

E-Invoicing is the electronic transfer of invoice data from the supplier to the buyer usually through a third-party network that facilitates and streamlines the exchange process. Invoice information is taken directly from a supplier’s billing system, validated and enriched via the network platform and then imported directly into their customer’s ERP system. No paper is involved and the manual intervention associated with exception handling in the OCR process is eliminated with e-Invoicing.

"Rumah Dijual
mengembalikan jati diri bangsa"

thomas_x · ‎06-26-2012

we just have created a transformer for tiff to searchable pdf's, and want to create a transformer from pdf to searchable pdf our transformer does not work !

here is the tiff to pdf transformer (working)

<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>    <beans>        <bean id="transformer.worker.ocr.tiff" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">            <property name="mimetypeService">                <ref bean="mimetypeService" />            </property>              <property name="checkCommand">                 <bean class="org.alfresco.util.exec.RuntimeExec">                    <property name="commandsAndArguments">                        <map>                            <entry key=".*">                                <list>    <!–                            <value>tesseract</value> –>                                    <value>/opt/alfresco-4.0.d/ocr</value>                                </list>                            </entry>                        </map>                    </property>                    <property name="errorCodes">                       <value>2</value>                    </property>                 </bean>              </property>              <property name="transformCommand">                 <bean class="org.alfresco.util.exec.RuntimeExec">                    <property name="commandsAndArguments">                        <map>                            <entry key=".*">                                <list>    <!–                            <value>tesseract</value>                                    <value>${source}</value>                                    <value>${target}</value>                                    <value>-l</value>                                    <value>deu</value> –>                                    <value>/opt/alfresco-4.0.d/ocr</value>                                    <value>${source}</value>                                    <value>${target}</value>                                </list>                            </entry>                        </map>                    </property>                    <property name="errorCodes">                       <value>1,2</value>                    </property>                 </bean>              </property>              <property name="explicitTransformations">                 <list>                    <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">                        <property name="sourceMimetype"><value>image/tiff</value></property>                        <property name="targetMimetype"><value>text/plain</value></property>                    </bean>                 </list>              </property>        </bean>        <bean id="transformer.ocr.tiff" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">            <property name="worker">                <ref bean="transformer.worker.ocr.tiff" />            </property>        </bean></beans>‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

here is our pdf to pdf transformer (not working)

<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>    <beans>        <bean id="transformer.worker.ocr.pdf" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">            <property name="mimetypeService">                <ref bean="mimetypeService" />            </property>              <property name="checkCommand">                 <bean class="org.alfresco.util.exec.RuntimeExec">                    <property name="commandsAndArguments">                        <map>                            <entry key=".*">                                <list>    <!–                            <value>tesseract</value> –>                                    <value>/opt/alfresco-4.0.d/ocrPDF</value>                                </list>                            </entry>                        </map>                    </property>                    <property name="errorCodes">                       <value>2</value>                    </property>                 </bean>              </property>              <property name="transformCommand">                 <bean class="org.alfresco.util.exec.RuntimeExec">                    <property name="commandsAndArguments">                        <map>                            <entry key=".*">                                <list>    <!–                            <value>tesseract</value>                                    <value>${source}</value>                                    <value>${target}</value>                                    <value>-l</value>                                    <value>deu</value> –>                                    <value>/opt/alfresco-4.0.d/ocrPDF</value>                                    <value>${source}</value>                                    <value>${target}</value>                                </list>                            </entry>                        </map>                    </property>                    <property name="errorCodes">                       <value>1,2</value>                    </property>                 </bean>              </property>              <property name="explicitTransformations">                 <list>                    <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">                        <property name="sourceMimetype"><value>application/pdf</value></property>                        <property name="targetMimetype"><value>text/plain</value></property>                    </bean>                 </list>              </property>        </bean>        <bean id="transformer.ocr.pdf" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">            <property name="worker">                <ref bean="transformer.worker.ocr.pdf" />            </property>        </bean></beans>‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

and here is the ocrPDF script

#!/bin/bash# Run OCR on a multi-page PDF file and create a new pdf with the# extracted text in hidden layer. Requires cuneiform, hocr2pdf, gs.# Usage: ./dwim.sh input.pdf output.pdfset -einput="$1"output="$2"echo "$(date)" >>/tmp/ocrtransform.logecho "ocrPDFfrom $input to $output" >>/tmp/ocrtransform.logtmpdir="$(mktemp -d)"# extract images of the pages (note: resolution hard-coded)gs -SDEVICE=tiffg4 -r300x300 -sOutputFile="$tmpdir/page-%04d.tiff" -dNOPAUSE -dBATCH – "$input"# OCR each page individually and convert into PDFfor page in "$tmpdir"/page-*.tiffdo    base="${page%.tiff}"#    cuneiform -f hocr -o "$base.html" "$page"    tesseract "$page" "$base" -l deu hocr    hocr2pdf -i "$page" -o "$base.pdf" < "$base.html"    echo "hocr2pdf $page to $base" >>/tmp/ocrtransform.logdone# combine the pages into one PDFgs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="$output" "$tmpdir"/page-*.pdfrm -rf – "$tmpdir"‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

we have tested the ocrPDF script …. it's ok !
but when we upload a file, the ocrPDF script is not executed !

does anybody know what's the problem in this transnformer definition ?

thanks

wmay · ‎08-01-2012

Hi,

We have implemented an OCR server integrated with Alfresco, which can be used as transformer or via Javascript and Java. It runs on a separate OCR server and supports Abbyy and Google OCR. for more informaiton see here - https://forums.alfresco.com/en/viewtopic.php?f=33&t=44739

Hyland Connect

Choice of OCR