12-12-2011 12:24 PM
Hi:
I'm trying to install 'Nuxeo-platform-ocr' (https://github.com/nuxeo/nuxeo-platform-ocr) , but I do not know where to locate the file 'content_in_doc', so that Nuxeo can use to analyze.
I have followed this manual https://github.com/nuxeo/nuxeo-platform-ocr, but not clear where to locate.
I'm using Ubuntu 10.11 + Tesseract + 3 + Nuxeo Olena (scribe)
Could you tell me where I locate the file 'content_in_doc'?
Thanks, and regards.
12-06-2012 05:05 AM
Olena is now available as deb packages for Debian and Ubuntu : http://www.lrde.epita.fr/cgi-bin/twiki/view/Olena/Download
Once installed, content_in_doc binary is located in /usr/lib/scribo/
12-06-2012 11:14 AM
Thanks for the info. I ve get this package and the content_in_doc is now working. Now I can get the plugin to work under Nuxeo 5.6, I will try again with Nuxeo 5.4.2 in order to see if the problem is generated by the changes introduced with 5.6
12-06-2012 11:44 AM
there is any sort of configuration about what extract, where extract and other? or we only to expect what olena think we expect? 🙂
12-07-2012 06:32 AM
Regarding content_in_doc binary itself, there is not much options provided
12-07-2012 06:46 AM
Configurations from nuxeo, for example ... If you have a component under nuxeo this is good but my customer want to configure every aspect directly from nuxeo
12-09-2012 11:36 PM
Sorry, I was not so clear in my previous post. Last year I was able to get this pluging worl¿king under Nuxeo 5.4.2 in an Oracle Linux installation. Then, I was trying to use the plugin in Nuxeo 5.5 under Ubuntu with the situation described above.
With the realease of Nuxeo 5.6 I decide to make a fresh installation under Ubuntu, and I was having troubles to get the content_in_doc binary. With Guillaume suggestion to use de new Olena package (thanks Guillaume!) I can get the file.
But when I try to use the OCR plugin, I get the same situation that under 5.5: he content_in_doc command is working fine. I try to convert an image from the commands lines and it works.
When I upload an image to Nuxeo, I can see a process like this running:
nuxeo 17203 17198 0 00:23 pts/0 00:00:00 content_in_doc /var/lib/nuxeo/server/tmp/cmdLineBasedConverter2216478922130777180.JPG /var/lib/nuxeo/server/tmp/ocr_olena_1355109823244.xml
And the file ocr_olena_xxxxxxx.xml is created under $NUXEO_HOME/tmp
But..... no annotations are generated in the document in Nuxeo"
And no errors are generated in server.log. This is all the information the log register after I do an upload: 2012-12-10 00:34:16,310 INFO [it.tidalwave.image.op.ReadOp] readMetadata(java.io.FileInputStream@705d5338, 0) 2012-12-10 00:34:16,319 INFO [it.tidalwave.image.op.ReadOp] read(java.io.FileInputStream@44f4660b, 0) 2012-12-10 00:34:17,438 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Lock : Nuxeo-Work-default-4 2012-12-10 00:34:17,439 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 >> enterCS: Thread R/W: 0/0 :: Model R/W: 0/0 (thread: Nuxeo-Work-default-4) 2012-12-10 00:34:17,439 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 << enterCS: Thread R/W: 1/0 :: Model R/W: 1/0 (thread: Nuxeo-Work-default-4) 2012-12-10 00:34:17,440 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 >> leaveCS: Thread R/W: 1/0 :: Model R/W: 1/0 (thread: Nuxeo-Work-default-4) 2012-12-10 00:34:17,441 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 << leaveCS: Thread R/W: 0/0 :: Model R/W: 0/0 (thread: Nuxeo-Work-default-4)
Any suggestion to get the OCR plugin to work under NUxeo 5.6?
Ruben
12-14-2012 01:15 PM
Does the XML file produced by content_in_doc contain anything ? Sometimes, if the document has a poor quality, content_in_doc may produce an empty file... 🙂
12-15-2012 07:56 PM
Yes, for example, the image: link text
Produces the following xml file in $NUXEO_HOME/server/tmp (I only reproduces some lines)
File: ocr_olena_1355615432288.xml
more ocr_olena_1355615432288.xml
<?xml version="1.0" encoding="UTF-8"?>
<PcGts>
<Metadata>
<Creator>LRDE</Creator>
<Created>2012-12-15T20:51:01</Created>
<LastChange>2012-12-15T20:51:01</LastChange>
<Comments>Generated by Scribo from Olena.</Comments>
</Metadata>
<Page imageFilename="noname" imageWidth="1200" imageHeight="880">
<TextRegion id="1" orientation="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="turquoise" kerning="4
" color="#567BC8" colorReliability="0" baseline="51" meanline="34" xHeight="18" dHeight="-7" aHeight="27" charWidth="14">
<Coords>
<Point x="30" y="41"/>
<Point x="30" y="32"/>
<Point x="42" y="32"/>
<Point x="42" y="33"/>
<Point x="43" y="33"/>
<Point x="43" y="34"/>
<Point x="44" y="34"/>
<Point x="44" y="36"/>
<Point x="45" y="36"/>
<Point x="45" y="37"/>
<Point x="171" y="37"/>
<Point x="171" y="36"/>
<Point x="275" y="36"/>
<Point x="275" y="31"/>
<Point x="276" y="31"/>
<Point x="276" y="30"/>
<Point x="322" y="30"/>
<Point x="322" y="29"/>
<Point x="377" y="29"/>
<Point x="377" y="34"/>
<Point x="378" y="34"/>
<Point x="378" y="35"/>
<Point x="459" y="35"/>
<Point x="459" y="34"/>
<Point x="498" y="34"/>
<Point x="498" y="30"/>
<Point x="499" y="30"/>
<Point x="499" y="28"/>
<Point x="499" y="34"/>
<Point x="568" y="34"/>
<Point x="568" y="29"/>
<Point x="569" y="29"/>
<Point x="569" y="28"/>
<Point x="569" y="33"/>
<Point x="663" y="33"/>
<Point x="663" y="32"/>
<Point x="666" y="32"/>
<Point x="666" y="30"/>
<Point x="667" y="30"/>
<Point x="667" y="27"/>
<Point x="668" y="27"/>
<Point x="668" y="26"/>
<Point x="690" y="26"/>
<Point x="690" y="27"/>
<Point x="699" y="27"/>
<Point x="699" y="32"/>
<Point x="876" y="32"/>
<Point x="876" y="26"/>
<Point x="878" y="26"/>
<Point x="878" y="25"/>
<Point x="903" y="25"/>
<Point x="903" y="31"/>
<Point x="920" y="31"/>
<Point x="920" y="32"/>
<Point x="921" y="32"/>
<Point x="921" y="33"/>
<Point x="923" y="33"/>
<Point x="923" y="35"/>
<Point x="924" y="35"/>
<Point x="924" y="38"/>
<Point x="925" y="38"/>
<Point x="925" y="40"/>
<Point x="924" y="40"/>
<Point x="924" y="44"/>
<Point x="923" y="44"/>
<Point x="923" y="45"/>
<Point x="922" y="45"/>
<Point x="922" y="46"/>
<Point x="921" y="46"/>
<Point x="921" y="47"/>
<Point x="919" y="47"/>
<Point x="919" y="48"/>
<Point x="860" y="48"/>
<Point x="860" y="53"/>
<Point x="859" y="53"/>
<Point x="859" y="49"/>
<Point x="784" y="49"/>
<Point x="784" y="50"/>
<Point x="783" y="50"/>
<Point x="783" y="53"/>
<Point x="782" y="53"/>
<Point x="782" y="54"/>
<Point x="781" y="54"/>
<Point x="781" y="49"/>
<Point x="717" y="49"/>
<Point x="717" y="51"/>
<Point x="716" y="51"/>
<Point x="716" y="54"/>
<Point x="715" y="54"/>
<Point x="715" y="55"/>
<Point x="715" y="49"/>
<Point x="651" y="49"/>
<Point x="651" y="50"/>
<Point x="486" y="50"/>
<Point x="486" y="51"/>
<Point x="401" y="51"/>
<Point x="401" y="55"/>
<Point x="401" y="52"/>
<Point x="226" y="52"/>
<Point x="226" y="57"/>
<Point x="225" y="57"/>
<Point x="225" y="58"/>
<Point x="224" y="58"/>
<Point x="224" y="53"/>
<Point x="54" y="53"/>
<Point x="54" y="54"/>
<Point x="54" y="53"/>
<Point x="30" y="53"/>
<Point x="30" y="42"/>
</Coords>
<Line text="Pese a su compacto diseï¬o, es realmente versétil y muy completo" id="7" boldness="2.78846" boldnessReliability="22.8793" color="#567BC8" colorReliability="4.8
4687" orientation="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="turquoise" kerning="4" baseline="51" m
eanline="34" xHeight="18" dHeight="-7" aHeight="27" charWidth="14">
<Coords>
<Point x="30" y="25"/>
<Point x="925" y="25"/>
<Point x="925" y="58"/>
<Point x="30" y="58"/>
</Coords>
</Line>
</TextRegion>
<TextRegion id="2" orientation="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="black" kerning="5" co
lor="#352F20" colorReliability="0" baseline="127" meanline="85" xHeight="43" dHeight="-2" aHeight="60" charWidth="34">
<Coords>
<Point x="33" y="98"/>
<Point x="33" y="73"/>
<Point x="81" y="73"/>
<Point x="81" y="85"/>
<Point x="216" y="85"/>
<Point x="216" y="75"/>
<Point x="217" y="75"/>
<Point x="217" y="74"/>
<Point x="219" y="74"/>
<Point x="219" y="73"/>
<Point x="220" y="73"/>
<Point x="220" y="72"/>
<Point x="222" y="72"/>
<Point x="222" y="71"/>
<Point x="224" y="71"/>
<Point x="224" y="70"/>
<Point x="471" y="70"/>
<Point x="471" y="69"/>
<Point x="473" y="69"/>
<Point x="473" y="68"/>
<Point x="475" y="68"/>
<Point x="475" y="69"/>
<Point x="568" y="69"/>
<Point x="568" y="70"/>
<Point x="571" y="70"/>
<Point x="571" y="71"/>
<Point x="573" y="71"/>
<Point x="573" y="72"/>
<Point x="575" y="72"/>
<Point x="575" y="73"/>
<Point x="576" y="73"/>
<Point x="576" y="74"/>
<Point x="577" y="74"/>
<Point x="577" y="75"/>
<Point x="578" y="75"/>
<Point x="578" y="77"/>
<Point x="579" y="77"/>
<Point x="579" y="80"/>
<Point x="580" y="80"/>
<Point x="580" y="115"/>
<Point x="581" y="115"/>
<Point x="581" y="125"/>
<Point x="507" y="125"/>
<Point x="507" y="126"/>
<Point x="421" y="126"/>
<Point x="421" y="127"/>
<Point x="256" y="127"/>
<Point x="256" y="128"/>
<Point x="145" y="128"/>
<Point x="145" y="129"/>
<Point x="57" y="129"/>
<Point x="57" y="128"/>
<Point x="33" y="128"/>
<Point x="33" y="99"/>
</Coords>
<Line text="Mountain Serie 2" id="25" boldness="8.28571" boldnessReliability="33.6896" color="#352F20" colorReliability="3.10364" orientation="0" readingOrientation="0" r
eadingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="black" kerning="5" baseline="127" meanline="85" xHeight="43" dHeight="-2" aHeight="6
0" charWidth="34">
<Coords>
<Point x="33" y="68"/>
<Point x="581" y="68"/>
<Point x="581" y="129"/>
<Point x="33" y="129"/>
</Coords>
</Line>
</TextRegion>
<TextRegion id="3" orientation="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" c
olor="#5D5559" colorReliability="13.0052" baseline="184" meanline="174" xHeight="11" dHeight="-4" aHeight="16" charWidth="8">
<Coords>
<Point x="29" y="513"/>
<Point x="29" y="357"/>
<Point x="28" y="357"/>
<Point x="28" y="281"/>
<Point x="29" y="281"/>
<Point x="29" y="280"/>
<Point x="30" y="280"/>
<Point x="30" y="279"/>
<Point x="34" y="279"/>
<Point x="34" y="276"/>
<Point x="35" y="276"/>
<Point x="35" y="275"/>
<Point x="102" y="275"/>
<Point x="102" y="180"/>
<Point x="103" y="180"/>
<Point x="103" y="175"/>
<Point x="136" y="175"/>
<Point x="136" y="174"/>
<Point x="194" y="174"/>
<Point x="194" y="171"/>
<Point x="302" y="171"/>
<Point x="302" y="170"/>
<Point x="415" y="170"/>
<Point x="415" y="169"/>
<Point x="415" y="173"/>
<Point x="427" y="173"/>
<Point x="427" y="174"/>
<Point x="428" y="174"/>
<Point x="428" y="202"/>
<Point x="429" y="202"/>
<Point x="429" y="233"/>
<Point x="428" y="233"/>
<Point x="428" y="253"/>
<Point x="429" y="253"/>
<Point x="429" y="331"/>
<Point x="427" y="331"/>
<Point x="427" y="354"/>
<Point x="429" y="354"/>
<Point x="429" y="409"/>
<Point x="430" y="409"/>
<Point x="430" y="512"/>
<Point x="429" y="512"/>
<Point x="429" y="518"/>
<Point x="430" y="518"/>
<Point x="430" y="565"/>
<Point x="431" y="565"/>
<Point x="430" y="565"/>
<Point x="430" y="615"/>
<Point x="431" y="615"/>
<Point x="431" y="750"/>
<Point x="430" y="750"/>
<Point x="430" y="769"/>
<Point x="431" y="769"/>
<Point x="431" y="776"/>
<Point x="430" y="776"/>
<Point x="430" y="795"/>
<Point x="431" y="795"/>
<Point x="431" y="854"/>
<Point x="427" y="854"/>
<Point x="427" y="855"/>
<Point x="397" y="855"/>
<Point x="397" y="857"/>
<Point x="396" y="857"/>
<Point x="396" y="858"/>
<Point x="245" y="858"/>
<Point x="245" y="859"/>
<Point x="244" y="859"/>
<Point x="244" y="858"/>
<Point x="183" y="858"/>
<Point x="183" y="857"/>
<Point x="32" y="857"/>
<Point x="32" y="848"/>
<Point x="31" y="848"/>
<Point x="31" y="847"/>
<Point x="30" y="847"/>
<Point x="30" y="822"/>
<Point x="31" y="822"/>
<Point x="31" y="770"/>
<Point x="30" y="770"/>
<Point x="30" y="572"/>
<Point x="31" y="572"/>
<Point x="31" y="560"/>
<Point x="30" y="560"/>
<Point x="30" y="543"/>
<Point x="29" y="543"/>
<Point x="29" y="514"/>
</Coords>
<Line text="a propuesta de Mountain para esta" id="49" boldness="16.6923" boldnessReliability="118.987" color="#666265" colorReliability="23.2958" orientation="0" reading
Orientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="184" meanline="174" xHeight="11" dHei
ght="-4" aHeight="16" charWidth="8">
<Coords>
<Point x="102" y="169"/>
<Point x="428" y="169"/>
<Point x="428" y="188"/>
<Point x="102" y="188"/>
</Coords>
</Line>
<Line text="comparative es peculiar en tanto que" id="66" boldness="18" boldnessReliability="144.741" color="#6A6467" colorReliability="33.4482" orientation="0" readingOr
ientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="210" meanline="200" xHeight="11" dHeigh
t="-4" aHeight="16" charWidth="8">
<Coords>
<Point x="102" y="195"/>
<Point x="429" y="195"/>
<Point x="429" y="214"/>
<Point x="102" y="214"/>
</Coords>
</Line>
<Line text="ha elegido una caja Antec Minuet" id="73" boldness="8.85185" boldnessReliability="83.2387" color="#524749" colorReliability="33.7844" orientation="0" readingO
rientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="236" meanline="226" xHeight="11" dHeig
ht="-4" aHeight="16" charWidth="9">
<Coords>
<Point x="103" y="221"/>
<Point x="429" y="221"/>
<Point x="429" y="240"/>
<Point x="103" y="240"/>
</Coords>
</Line>
<Line text="para alojar una equlllbrada seleccién" id="86" boldness="8.36364" boldnessReliability="85.951" color="#4A4143" colorReliability="31.384" orientation="0" readi
ngOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="261" meanline="251" xHeight="11" dH
eight="-5" aHeight="16" charWidth="8">
<Coords>
<Point x="103" y="246"/>
<Point x="429" y="246"/>
<Point x="429" y="266"/>
<Point x="103" y="266"/>
</Coords>
</Line>
<Line text="de componentes. No destacan por ser Ios" id="96" boldness="10.4375" boldnessReliability="96.4558" color="#52494D" colorReliability="28.5853" orientation="0" r
eadingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="288" meanline="277" xHeight="12
" dHeight="-4" aHeight="16" charWidth="9">
<Coords>
<Point x="28" y="273"/>
<Point x="429" y="273"/>
<Point x="429" y="292"/>
<Point x="28" y="292"/>
</Coords>
</Line>
<Line text="ma's répidos de este informe. pero no quedan" id="112" boldness="18.4857" boldnessReliability="164.682" color="#696166" colorReliability="29.5019" orientation
="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="314" meanline="303" xHeig
ht="12" dHeight="-4" aHeight="16" charWidth="9">
<Coords>
<Point x="29" y="299"/>
<Point x="429" y="299"/>
<Point x="429" y="318"/>
<Point x="29" y="318"/>
</Coords>
</Line>
<Line text="mal en ninquna de las pruebas de rendimien-" id="119" boldness="20.6571" boldnessReliability="168.983" color="#655D62" colorReliability="27.1991" orientation=
"0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="3" baseline="339" meanline="329" xHeigh
t="11" dHeight="-5" aHeight="16" charWidth="8">
<Coords>
<Point x="29" y="324"/>
<Point x="429" y="324"/>
<Point x="429" y="344"/>
<Point x="29" y="344"/>
</Coords>
</Line>
<Line text="to del banco de benchmarks. La ausencia més" id="128" boldness="22.7143" boldnessReliability="184.03" color="#676064" colorReliability="27.9409" orientation="
0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="365" meanline="354" xHeight
="12" dHeight="-1" aHeight="17" charWidth="9">
<Coords>
<Point x="28" y="349"/>
<Point x="429" y="349"/>
<Point x="429" y="366"/>
<Point x="28" y="366"/>
</Coords>
</Line>
<Line text="llamativa es la de ma's memorla RAM. que" id="145" boldness="11.9355" boldnessReliability="114.239" color="#564E51" colorReliability="33.6988" orientation="0"
readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="391" meanline="380" xHeight="
12" dHeight="-2" aHeight="16" charWidth="9">
<Coords>
<Point x="29" y="376"/>
<Point x="429" y="376"/>
<Point x="429" y="393"/>
<Point x="29" y="393"/>
</Coords>
</Line>
<Line text="aunque es de buena factura, nunca esté de" id="154" boldness="15.1515" boldnessReliability="135.539" color="#554D50" colorReliability="33.7148" orientation="0
" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="3" baseline="417" meanline="406" xHeight=
"12" dHeight="-4" aHeight="17" charWidth="8">
<Coords>
<Point x="29" y="401"/>
<Point x="430" y="401"/>
<Point x="430" y="421"/>
<Point x="29" y="421"/>
</Coords>
</Line>
02-02-2013 06:50 AM
0
02-02-2013 06:47 AM
Hi @rbahntje i was facing the same problem $NUXEO_HOME/server/tmp/ocr_olena_xxx.xml was generated succesfuly but no annotations. After a few days a sloved this by editing code at github and now it works fine in production. I have create a pull request at https://github.com/nuxeo/nuxeo-platform-ocr/pulls describing the problem and solution.
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.