11-15-2010 12:29 PM
tesseract input_file.tif output_file.txt
you will get a file output_file.txt.txttesseract input_file.tif output_file -l eng
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>
<beans>
<bean id="transformer.worker.ocr.tiff" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
<property name="mimetypeService">
<ref bean="mimetypeService" />
</property>
<property name="checkCommand">
<bean class="org.alfresco.util.exec.RuntimeExec">
<property name="commandsAndArguments">
<map>
<entry key=".*">
<list>
<!– <value>tesseract</value> –>
<value>/opt/alfresco/ocr</value>
</list>
</entry>
</map>
</property>
<property name="errorCodes">
<value>2</value>
</property>
</bean>
</property>
<property name="transformCommand">
<bean class="org.alfresco.util.exec.RuntimeExec">
<property name="commandsAndArguments">
<map>
<entry key=".*">
<list>
<!– <value>tesseract</value>
<value>${source}</value>
<value>${target}</value>
<value>-l</value>
<value>eng</value> –>
<value>/opt/alfresco/ocr</value>
<value>${source}</value>
<value>${target}</value>
</list>
</entry>
</map>
</property>
<property name="errorCodes">
<value>1,2</value>
</property>
</bean>
</property>
<property name="explicitTransformations">
<list>
<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
<property name="sourceMimetype"><value>image/tiff</value></property>
<property name="targetMimetype"><value>text/plain</value></property>
</bean>
</list>
</property>
</bean>
<bean id="transformer.ocr.tiff" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
<property name="worker">
<ref bean="transformer.worker.ocr.tiff" />
</property>
</bean>
</beans>
#!/bin/bash
# save arguments to variables
SOURCE=$1
TARGET=$2
TMPDIR=/tmp
FILENAME=`basename $SOURCE`
OCRFILE=$FILENAME.tif
# to see what happens
#echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log
cp -f $SOURCE $TMPDIR/$OCRFILE
# call tesseract and redirect output to $TARGET
tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng
rm -f $TMPDIR/$OCRFILE
12-05-2011 11:24 AM
Thank you very much plepot.![]()
It's work for me. But, does tesseract has the ability for converting TIF to PDF?
Thanks in advanced.
01-30-2012 10:34 PM
You can define a transformer from tiff to pdf. The original tiff will be shown in the PDF with hidden Textlayer - searchable, markable, indexed by alfresco.
You can do that in a shell-script using optimize2bw, tesseract, hocr2pdf and ptftk.
Best Regards
ml
01-31-2012 04:19 AM
#!/bin/bash
#############################################################
# tiff_ocr2pdf.sh
# TIF-Datei in durchsuchbares PDF umwandeln
#############################################################
# 31.10.2011 ml - neu erstellt
#############################################################
SOURCE=$1
TARGET=$2
TEMP=`mktemp -t tiffocrXXXXXXXX`
TEMP="${TEMP}_"
tiffsplit $1 "${TEMP}"
for TIFF in ${TEMP}*
do
# segmentation fault bei –denoise!
optimize2bw –dpi 300 -i ${TIFF} -o ${TIFF}opt.tif
tesseract ${TIFF}opt.tif ${TIFF}tmp hocr
hocr2pdf -s -i ${TIFF} -o ${TIFF}.pdf < ${TIFF}tmp.html
done
# PDFs zusammenfassen
pdftk ${TEMP}*.pdf output $2
#############################################################
# aufraeumen
rm ${TEMP}*
08-01-2012 10:48 AM
01-11-2013 10:02 AM
08-14-2015 01:31 AM
# call tesseract and redirect output to $TARGET
/usr/bin/tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng
/tmp/RuntimeExecutableContentTransformerWorker_source_2064892405511152431.tiff.tif
/opt/alfresco-4.2.f/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_3161704112042319622.txt
Tags
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.