cancel
Showing results for 
Search instead for 
Did you mean: 

Nuxeo-Platform-OCR Question

Soni_
Champ on-the-rise
Champ on-the-rise

Hi:

I'm trying to install 'Nuxeo-platform-ocr' (https://github.com/nuxeo/nuxeo-platform-ocr) , but I do not know where to locate the file 'content_in_doc', so that Nuxeo can use to analyze.

I have followed this manual https://github.com/nuxeo/nuxeo-platform-ocr, but not clear where to locate.

I'm using Ubuntu 10.11 + Tesseract + 3 + Nuxeo Olena (scribe)

Could you tell me where I locate the file 'content_in_doc'?

Thanks, and regards.

30 REPLIES 30

glazzara_Lazzar
Champ in-the-making
Champ in-the-making

Olena is now available as deb packages for Debian and Ubuntu : http://www.lrde.epita.fr/cgi-bin/twiki/view/Olena/Download

Once installed, content_in_doc binary is located in /usr/lib/scribo/

rbahntje_Bahntj
Confirmed Champ
Confirmed Champ

Thanks for the info. I ve get this package and the content_in_doc is now working. Now I can get the plugin to work under Nuxeo 5.6, I will try again with Nuxeo 5.4.2 in order to see if the problem is generated by the changes introduced with 5.6

yayo_
Confirmed Champ
Confirmed Champ

there is any sort of configuration about what extract, where extract and other? or we only to expect what olena think we expect? 🙂

Regarding content_in_doc binary itself, there is not much options provided

yayo_
Confirmed Champ
Confirmed Champ

Configurations from nuxeo, for example ... If you have a component under nuxeo this is good but my customer want to configure every aspect directly from nuxeo

rbahntje_Bahntj
Confirmed Champ
Confirmed Champ

Sorry, I was not so clear in my previous post. Last year I was able to get this pluging worl¿king under Nuxeo 5.4.2 in an Oracle Linux installation. Then, I was trying to use the plugin in Nuxeo 5.5 under Ubuntu with the situation described above.

With the realease of Nuxeo 5.6 I decide to make a fresh installation under Ubuntu, and I was having troubles to get the content_in_doc binary. With Guillaume suggestion to use de new Olena package (thanks Guillaume!) I can get the file.

But when I try to use the OCR plugin, I get the same situation that under 5.5: he content_in_doc command is working fine. I try to convert an image from the commands lines and it works.

When I upload an image to Nuxeo, I can see a process like this running:

nuxeo 17203 17198 0 00:23 pts/0 00:00:00 content_in_doc /var/lib/nuxeo/server/tmp/cmdLineBasedConverter2216478922130777180.JPG /var/lib/nuxeo/server/tmp/ocr_olena_1355109823244.xml

And the file ocr_olena_xxxxxxx.xml is created under $NUXEO_HOME/tmp

But..... no annotations are generated in the document in Nuxeo"

And no errors are generated in server.log. This is all the information the log register after I do an upload: 2012-12-10 00:34:16,310 INFO [it.tidalwave.image.op.ReadOp] readMetadata(java.io.FileInputStream@705d5338, 0) 2012-12-10 00:34:16,319 INFO [it.tidalwave.image.op.ReadOp] read(java.io.FileInputStream@44f4660b, 0) 2012-12-10 00:34:17,438 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Lock : Nuxeo-Work-default-4 2012-12-10 00:34:17,439 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 >> enterCS: Thread R/W: 0/0 :: Model R/W: 0/0 (thread: Nuxeo-Work-default-4) 2012-12-10 00:34:17,439 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 << enterCS: Thread R/W: 1/0 :: Model R/W: 1/0 (thread: Nuxeo-Work-default-4) 2012-12-10 00:34:17,440 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 >> leaveCS: Thread R/W: 1/0 :: Model R/W: 1/0 (thread: Nuxeo-Work-default-4) 2012-12-10 00:34:17,441 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 << leaveCS: Thread R/W: 0/0 :: Model R/W: 0/0 (thread: Nuxeo-Work-default-4)

Any suggestion to get the OCR plugin to work under NUxeo 5.6?

Ruben

glazzara_Lazzar
Champ in-the-making
Champ in-the-making

Does the XML file produced by content_in_doc contain anything ? Sometimes, if the document has a poor quality, content_in_doc may produce an empty file... 🙂

rbahntje_Bahntj
Confirmed Champ
Confirmed Champ

Yes, for example, the image: link text

Produces the following xml file in $NUXEO_HOME/server/tmp (I only reproduces some lines)

File: ocr_olena_1355615432288.xml

more ocr_olena_1355615432288.xml
<?xml version="1.0" encoding="UTF-8"?>
<PcGts>
  <Metadata>
    <Creator>LRDE</Creator>
    <Created>2012-12-15T20:51:01</Created>
    <LastChange>2012-12-15T20:51:01</LastChange>
    <Comments>Generated by Scribo from Olena.</Comments>
  </Metadata>
  <Page imageFilename="noname" imageWidth="1200" imageHeight="880">
    <TextRegion id="1" orientation="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="turquoise" kerning="4
" color="#567BC8" colorReliability="0" baseline="51" meanline="34" xHeight="18" dHeight="-7" aHeight="27" charWidth="14">
      <Coords>
        <Point x="30" y="41"/>
        <Point x="30" y="32"/>
        <Point x="42" y="32"/>
        <Point x="42" y="33"/>
        <Point x="43" y="33"/>
        <Point x="43" y="34"/>
        <Point x="44" y="34"/>
        <Point x="44" y="36"/>
        <Point x="45" y="36"/>
        <Point x="45" y="37"/>
        <Point x="171" y="37"/>
        <Point x="171" y="36"/>
        <Point x="275" y="36"/>
        <Point x="275" y="31"/>
        <Point x="276" y="31"/>
        <Point x="276" y="30"/>
        <Point x="322" y="30"/>
        <Point x="322" y="29"/>
        <Point x="377" y="29"/>
        <Point x="377" y="34"/>
        <Point x="378" y="34"/>
        <Point x="378" y="35"/>
        <Point x="459" y="35"/>
        <Point x="459" y="34"/>
        <Point x="498" y="34"/>
        <Point x="498" y="30"/>
        <Point x="499" y="30"/>
        <Point x="499" y="28"/>
        <Point x="499" y="34"/>
        <Point x="568" y="34"/>
        <Point x="568" y="29"/>
        <Point x="569" y="29"/>
        <Point x="569" y="28"/>
        <Point x="569" y="33"/>
        <Point x="663" y="33"/>
        <Point x="663" y="32"/>
        <Point x="666" y="32"/>
        <Point x="666" y="30"/>
        <Point x="667" y="30"/>
        <Point x="667" y="27"/>
        <Point x="668" y="27"/>
        <Point x="668" y="26"/>
        <Point x="690" y="26"/>
        <Point x="690" y="27"/>
        <Point x="699" y="27"/>
        <Point x="699" y="32"/>
        <Point x="876" y="32"/>
        <Point x="876" y="26"/>
        <Point x="878" y="26"/>
        <Point x="878" y="25"/>
        <Point x="903" y="25"/>
        <Point x="903" y="31"/>
        <Point x="920" y="31"/>
        <Point x="920" y="32"/>
        <Point x="921" y="32"/>
        <Point x="921" y="33"/>
        <Point x="923" y="33"/>
        <Point x="923" y="35"/>
        <Point x="924" y="35"/>
        <Point x="924" y="38"/>
        <Point x="925" y="38"/>
        <Point x="925" y="40"/>
        <Point x="924" y="40"/>
        <Point x="924" y="44"/>
        <Point x="923" y="44"/>
        <Point x="923" y="45"/>
        <Point x="922" y="45"/>
        <Point x="922" y="46"/>
        <Point x="921" y="46"/>
        <Point x="921" y="47"/>
        <Point x="919" y="47"/>
        <Point x="919" y="48"/>
        <Point x="860" y="48"/>
        <Point x="860" y="53"/>
        <Point x="859" y="53"/>
        <Point x="859" y="49"/>
        <Point x="784" y="49"/>
        <Point x="784" y="50"/>
        <Point x="783" y="50"/>
        <Point x="783" y="53"/>
        <Point x="782" y="53"/>
        <Point x="782" y="54"/>
        <Point x="781" y="54"/>
        <Point x="781" y="49"/>
        <Point x="717" y="49"/>
        <Point x="717" y="51"/>
        <Point x="716" y="51"/>
        <Point x="716" y="54"/>
        <Point x="715" y="54"/>
        <Point x="715" y="55"/>
        <Point x="715" y="49"/>
        <Point x="651" y="49"/>
        <Point x="651" y="50"/>
        <Point x="486" y="50"/>
        <Point x="486" y="51"/>
        <Point x="401" y="51"/>
        <Point x="401" y="55"/>
        <Point x="401" y="52"/>
        <Point x="226" y="52"/>
        <Point x="226" y="57"/>
        <Point x="225" y="57"/>
        <Point x="225" y="58"/>
        <Point x="224" y="58"/>
        <Point x="224" y="53"/>
        <Point x="54" y="53"/>
        <Point x="54" y="54"/>
        <Point x="54" y="53"/>
        <Point x="30" y="53"/>
        <Point x="30" y="42"/>
      </Coords>
        <Line text="Pese a su compacto diseï¬o, es realmente versétil y muy completo" id="7" boldness="2.78846" boldnessReliability="22.8793" color="#567BC8" colorReliability="4.8
4687" orientation="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="turquoise" kerning="4" baseline="51" m
eanline="34" xHeight="18" dHeight="-7" aHeight="27" charWidth="14">
          <Coords>
            <Point x="30" y="25"/>
            <Point x="925" y="25"/>
            <Point x="925" y="58"/>
            <Point x="30" y="58"/>
          </Coords>
        </Line>
    </TextRegion>
    <TextRegion id="2" orientation="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="black" kerning="5" co
lor="#352F20" colorReliability="0" baseline="127" meanline="85" xHeight="43" dHeight="-2" aHeight="60" charWidth="34">
      <Coords>
        <Point x="33" y="98"/>
        <Point x="33" y="73"/>
        <Point x="81" y="73"/>
        <Point x="81" y="85"/>
        <Point x="216" y="85"/>
        <Point x="216" y="75"/>
        <Point x="217" y="75"/>
        <Point x="217" y="74"/>
        <Point x="219" y="74"/>
        <Point x="219" y="73"/>
        <Point x="220" y="73"/>
        <Point x="220" y="72"/>
        <Point x="222" y="72"/>
        <Point x="222" y="71"/>
        <Point x="224" y="71"/>
        <Point x="224" y="70"/>
        <Point x="471" y="70"/>
        <Point x="471" y="69"/>
        <Point x="473" y="69"/>
        <Point x="473" y="68"/>
        <Point x="475" y="68"/>
        <Point x="475" y="69"/>
        <Point x="568" y="69"/>
        <Point x="568" y="70"/>
        <Point x="571" y="70"/>
        <Point x="571" y="71"/>
        <Point x="573" y="71"/>
        <Point x="573" y="72"/>
        <Point x="575" y="72"/>
        <Point x="575" y="73"/>
        <Point x="576" y="73"/>
        <Point x="576" y="74"/>
        <Point x="577" y="74"/>
        <Point x="577" y="75"/>
        <Point x="578" y="75"/>
        <Point x="578" y="77"/>
        <Point x="579" y="77"/>
        <Point x="579" y="80"/>
        <Point x="580" y="80"/>
        <Point x="580" y="115"/>
        <Point x="581" y="115"/>
        <Point x="581" y="125"/>
        <Point x="507" y="125"/>
        <Point x="507" y="126"/>
        <Point x="421" y="126"/>
        <Point x="421" y="127"/>
        <Point x="256" y="127"/>
        <Point x="256" y="128"/>
        <Point x="145" y="128"/>
        <Point x="145" y="129"/>
        <Point x="57" y="129"/>
        <Point x="57" y="128"/>
        <Point x="33" y="128"/>
        <Point x="33" y="99"/>
      </Coords>
        <Line text="Mountain Serie 2" id="25" boldness="8.28571" boldnessReliability="33.6896" color="#352F20" colorReliability="3.10364" orientation="0" readingOrientation="0" r
eadingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="black" kerning="5" baseline="127" meanline="85" xHeight="43" dHeight="-2" aHeight="6
0" charWidth="34">
          <Coords>
            <Point x="33" y="68"/>
            <Point x="581" y="68"/>
            <Point x="581" y="129"/>
            <Point x="33" y="129"/>
          </Coords>
        </Line>
    </TextRegion>
    <TextRegion id="3" orientation="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" c
olor="#5D5559" colorReliability="13.0052" baseline="184" meanline="174" xHeight="11" dHeight="-4" aHeight="16" charWidth="8">
      <Coords>
        <Point x="29" y="513"/>
        <Point x="29" y="357"/>
        <Point x="28" y="357"/>
        <Point x="28" y="281"/>
        <Point x="29" y="281"/>
        <Point x="29" y="280"/>
        <Point x="30" y="280"/>
        <Point x="30" y="279"/>
        <Point x="34" y="279"/>
        <Point x="34" y="276"/>
        <Point x="35" y="276"/>
        <Point x="35" y="275"/>
        <Point x="102" y="275"/>
        <Point x="102" y="180"/>
        <Point x="103" y="180"/>
        <Point x="103" y="175"/>
        <Point x="136" y="175"/>
        <Point x="136" y="174"/>
        <Point x="194" y="174"/>
        <Point x="194" y="171"/>
        <Point x="302" y="171"/>
        <Point x="302" y="170"/>
        <Point x="415" y="170"/>
        <Point x="415" y="169"/>
        <Point x="415" y="173"/>
        <Point x="427" y="173"/>
        <Point x="427" y="174"/>
        <Point x="428" y="174"/>
        <Point x="428" y="202"/>
        <Point x="429" y="202"/>
        <Point x="429" y="233"/>
        <Point x="428" y="233"/>
        <Point x="428" y="253"/>
        <Point x="429" y="253"/>
        <Point x="429" y="331"/>
        <Point x="427" y="331"/>
        <Point x="427" y="354"/>
        <Point x="429" y="354"/>
        <Point x="429" y="409"/>
        <Point x="430" y="409"/>
        <Point x="430" y="512"/>
        <Point x="429" y="512"/>
        <Point x="429" y="518"/>
        <Point x="430" y="518"/>
        <Point x="430" y="565"/>
        <Point x="431" y="565"/>
        <Point x="430" y="565"/>
        <Point x="430" y="615"/>
        <Point x="431" y="615"/>
        <Point x="431" y="750"/>
        <Point x="430" y="750"/>
        <Point x="430" y="769"/>
        <Point x="431" y="769"/>
        <Point x="431" y="776"/>
        <Point x="430" y="776"/>
        <Point x="430" y="795"/>
        <Point x="431" y="795"/>
        <Point x="431" y="854"/>
        <Point x="427" y="854"/>
        <Point x="427" y="855"/>
        <Point x="397" y="855"/>
        <Point x="397" y="857"/>
        <Point x="396" y="857"/>
        <Point x="396" y="858"/>
        <Point x="245" y="858"/>
        <Point x="245" y="859"/>
        <Point x="244" y="859"/>
        <Point x="244" y="858"/>
        <Point x="183" y="858"/>
        <Point x="183" y="857"/>
        <Point x="32" y="857"/>
        <Point x="32" y="848"/>
        <Point x="31" y="848"/>
        <Point x="31" y="847"/>
        <Point x="30" y="847"/>
        <Point x="30" y="822"/>
        <Point x="31" y="822"/>
        <Point x="31" y="770"/>
        <Point x="30" y="770"/>
        <Point x="30" y="572"/>
        <Point x="31" y="572"/>
        <Point x="31" y="560"/>
        <Point x="30" y="560"/>
        <Point x="30" y="543"/>
        <Point x="29" y="543"/>
        <Point x="29" y="514"/>
      </Coords>
        <Line text="a propuesta de Mountain para esta" id="49" boldness="16.6923" boldnessReliability="118.987" color="#666265" colorReliability="23.2958" orientation="0" reading
Orientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="184" meanline="174" xHeight="11" dHei
ght="-4" aHeight="16" charWidth="8">
          <Coords>
            <Point x="102" y="169"/>
            <Point x="428" y="169"/>
            <Point x="428" y="188"/>
            <Point x="102" y="188"/>
          </Coords>
        </Line>
        <Line text="comparative es peculiar en tanto que" id="66" boldness="18" boldnessReliability="144.741" color="#6A6467" colorReliability="33.4482" orientation="0" readingOr
ientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="210" meanline="200" xHeight="11" dHeigh
t="-4" aHeight="16" charWidth="8">
          <Coords>
            <Point x="102" y="195"/>
            <Point x="429" y="195"/>
            <Point x="429" y="214"/>
            <Point x="102" y="214"/>
          </Coords>
        </Line>
        <Line text="ha elegido una caja Antec Minuet" id="73" boldness="8.85185" boldnessReliability="83.2387" color="#524749" colorReliability="33.7844" orientation="0" readingO
rientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="236" meanline="226" xHeight="11" dHeig
ht="-4" aHeight="16" charWidth="9">
          <Coords>
            <Point x="103" y="221"/>
            <Point x="429" y="221"/>
            <Point x="429" y="240"/>
            <Point x="103" y="240"/>
          </Coords>
        </Line>
        <Line text="para alojar una equlllbrada seleccién" id="86" boldness="8.36364" boldnessReliability="85.951" color="#4A4143" colorReliability="31.384" orientation="0" readi
ngOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="261" meanline="251" xHeight="11" dH
eight="-5" aHeight="16" charWidth="8">
          <Coords>
            <Point x="103" y="246"/>
            <Point x="429" y="246"/>
            <Point x="429" y="266"/>
            <Point x="103" y="266"/>
          </Coords>
        </Line>
        <Line text="de componentes. No destacan por ser Ios" id="96" boldness="10.4375" boldnessReliability="96.4558" color="#52494D" colorReliability="28.5853" orientation="0" r
eadingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="288" meanline="277" xHeight="12
" dHeight="-4" aHeight="16" charWidth="9">
          <Coords>
            <Point x="28" y="273"/>
            <Point x="429" y="273"/>
            <Point x="429" y="292"/>
            <Point x="28" y="292"/>
          </Coords>
        </Line>
        <Line text="ma's répidos de este informe. pero no quedan" id="112" boldness="18.4857" boldnessReliability="164.682" color="#696166" colorReliability="29.5019" orientation
="0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="314" meanline="303" xHeig
ht="12" dHeight="-4" aHeight="16" charWidth="9">
          <Coords>
            <Point x="29" y="299"/>
            <Point x="429" y="299"/>
            <Point x="429" y="318"/>
            <Point x="29" y="318"/>
          </Coords>
        </Line>
        <Line text="mal en ninquna de las pruebas de rendimien-" id="119" boldness="20.6571" boldnessReliability="168.983" color="#655D62" colorReliability="27.1991" orientation=
"0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="3" baseline="339" meanline="329" xHeigh
t="11" dHeight="-5" aHeight="16" charWidth="8">
          <Coords>
            <Point x="29" y="324"/>
            <Point x="429" y="324"/>
            <Point x="429" y="344"/>
            <Point x="29" y="344"/>
          </Coords>
        </Line>
        <Line text="to del banco de benchmarks. La ausencia més" id="128" boldness="22.7143" boldnessReliability="184.03" color="#676064" colorReliability="27.9409" orientation="
0" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="365" meanline="354" xHeight
="12" dHeight="-1" aHeight="17" charWidth="9">
          <Coords>
            <Point x="28" y="349"/>
            <Point x="429" y="349"/>
            <Point x="429" y="366"/>
            <Point x="28" y="366"/>
          </Coords>
        </Line>
        <Line text="llamativa es la de ma's memorla RAM. que" id="145" boldness="11.9355" boldnessReliability="114.239" color="#564E51" colorReliability="33.6988" orientation="0"
 readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="2" baseline="391" meanline="380" xHeight="
12" dHeight="-2" aHeight="16" charWidth="9">
          <Coords>
            <Point x="29" y="376"/>
            <Point x="429" y="376"/>
            <Point x="429" y="393"/>
            <Point x="29" y="393"/>
          </Coords>
        </Line>
        <Line text="aunque es de buena factura, nunca esté de" id="154" boldness="15.1515" boldnessReliability="135.539" color="#554D50" colorReliability="33.7148" orientation="0
" readingOrientation="0" readingDirection="left-to-right" type="text" reverseVideo="false" indented="false" textColour="indigo" kerning="3" baseline="417" meanline="406" xHeight=
"12" dHeight="-4" aHeight="17" charWidth="8">
          <Coords>
            <Point x="29" y="401"/>
            <Point x="430" y="401"/>
            <Point x="430" y="421"/>
            <Point x="29" y="421"/>
          </Coords>
        </Line>

0

klevis_
Champ in-the-making
Champ in-the-making

Hi @rbahntje i was facing the same problem $NUXEO_HOME/server/tmp/ocr_olena_xxx.xml was generated succesfuly but no annotations. After a few days a sloved this by editing code at github and now it works fine in production. I have create a pull request at https://github.com/nuxeo/nuxeo-platform-ocr/pulls describing the problem and solution.