<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Nuxeo-Platform-OCR Question in Nuxeo Forum</title>
    <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325240#M12241</link>
    <description>&lt;P&gt;Sorry, I was not so clear in my previous post.
Last year I was able to get this pluging worl¿king under Nuxeo 5.4.2 in an Oracle Linux installation.
Then, I was trying to use the plugin in Nuxeo 5.5 under Ubuntu with the situation described above.&lt;/P&gt;
&lt;P&gt;With the realease of Nuxeo 5.6 I decide to make a fresh installation under Ubuntu, and I was having troubles to get the content_in_doc binary. With Guillaume suggestion to use de new Olena package (thanks Guillaume!) I can get the file.&lt;/P&gt;
&lt;P&gt;But when I try to use the OCR plugin, I get the same situation that under 5.5:
he content_in_doc command is working fine. I try to convert an image from the commands lines and it works.&lt;/P&gt;
&lt;P&gt;When I upload an image to Nuxeo, I can see a process like this running:&lt;/P&gt;
&lt;P&gt;nuxeo    17203 17198  0 00:23 pts/0    00:00:00 content_in_doc /var/lib/nuxeo/server/tmp/cmdLineBasedConverter2216478922130777180.JPG /var/lib/nuxeo/server/tmp/ocr_olena_1355109823244.xml&lt;/P&gt;
&lt;P&gt;And the file ocr_olena_xxxxxxx.xml is created under $NUXEO_HOME/tmp&lt;/P&gt;
&lt;P&gt;But..... no annotations are generated in the document in Nuxeo"&lt;/P&gt;
&lt;P&gt;And no errors are generated in server.log.
This is all the information the log register after I do an upload:
2012-12-10 00:34:16,310 INFO  [it.tidalwave.image.op.ReadOp] readMetadata(java.io.FileInputStream@705d5338, 0)
2012-12-10 00:34:16,319 INFO  [it.tidalwave.image.op.ReadOp] read(java.io.FileInputStream@44f4660b, 0)
2012-12-10 00:34:17,438 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Lock : Nuxeo-Work-default-4
2012-12-10 00:34:17,439 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 &amp;gt;&amp;gt; enterCS: Thread R/W: 0/0 :: Model R/W: 0/0 (thread: Nuxeo-Work-default-4)
2012-12-10 00:34:17,439 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 &amp;lt;&amp;lt; enterCS: Thread R/W: 1/0 :: Model R/W: 1/0 (thread: Nuxeo-Work-default-4)
2012-12-10 00:34:17,440 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 &amp;gt;&amp;gt; leaveCS: Thread R/W: 1/0 :: Model R/W: 1/0 (thread: Nuxeo-Work-default-4)
2012-12-10 00:34:17,441 DEBUG [com.hp.hpl.jena.shared.LockMRSW] Nuxeo-Work-default-4 &amp;lt;&amp;lt; leaveCS: Thread R/W: 0/0 :: Model R/W: 0/0 (thread: Nuxeo-Work-default-4)&lt;/P&gt;
&lt;P&gt;Any suggestion to get the OCR plugin to work under NUxeo 5.6?&lt;/P&gt;
&lt;P&gt;Ruben&lt;/P&gt;</description>
    <pubDate>Mon, 10 Dec 2012 04:36:25 GMT</pubDate>
    <dc:creator>rbahntje_Bahntj</dc:creator>
    <dc:date>2012-12-10T04:36:25Z</dc:date>
    <item>
      <title>Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325215#M12216</link>
      <description>&lt;P&gt;Hi:&lt;/P&gt;
&lt;P&gt;I'm trying to install 'Nuxeo-platform-ocr' (https://github.com/nuxeo/nuxeo-platform-ocr) , but I do not know where to locate the file 'content_in_doc', so that Nuxeo can use to analyze.&lt;/P&gt;
&lt;P&gt;I have followed this manual &lt;A href="https://github.com/nuxeo/nuxeo-platform-ocr" target="test_blank"&gt;https://github.com/nuxeo/nuxeo-platform-ocr&lt;/A&gt;, but not clear where to locate.&lt;/P&gt;
&lt;P&gt;I'm using Ubuntu 10.11 + Tesseract + 3 + Nuxeo Olena (scribe)&lt;/P&gt;
&lt;P&gt;Could you tell me where I locate the file 'content_in_doc'?&lt;/P&gt;
&lt;P&gt;Thanks, and regards.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Dec 2011 17:24:09 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325215#M12216</guid>
      <dc:creator>Soni_</dc:creator>
      <dc:date>2011-12-12T17:24:09Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325216#M12217</link>
      <description>&lt;P&gt;I just tried to build against the latest stable version (2.0) of Olena and it seems to work fine. I have updated the &lt;A href="https://github.com/nuxeo/nuxeo-platform-ocr/blob/develop/README.md"&gt;README.md&lt;/A&gt; of &lt;CODE&gt;nuxeo-platform-ocr&lt;/CODE&gt; to point to the right source archive.&lt;/P&gt;
&lt;P&gt;Beware that the build of olena is has several steps and &lt;STRONG&gt;2 calls to make in 2 separate folders&lt;/STRONG&gt; (the build root and the &lt;CODE&gt;scribo/src&lt;/CODE&gt; subfolder):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;$ wget &lt;A href="http://www.lrde.epita.fr/dload/olena/2.0/olena-2.0.tar.bz2" target="test_blank"&gt;http://www.lrde.epita.fr/dload/olena/2.0/olena-2.0.tar.bz2&lt;/A&gt;
$ tar jxvf olena-*.tar.bz2
$ cd olena-2.0/
$ mkdir _build
$ cd _build
$ ../configure &amp;amp;&amp;amp; make
$ cd scribo/src
$ make
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The &lt;CODE&gt;scribo/src&lt;/CODE&gt; should then hold the &lt;CODE&gt;content_in_doc&lt;/CODE&gt; binary. If not check any error messages in the output the build. Maybe your are missing the development headers for tesseract? Have you installed tesseract 3 from the source tarball and installed it system-wide using &lt;CODE&gt;sudo make install&lt;/CODE&gt;?&lt;/P&gt;</description>
      <pubDate>Wed, 28 Dec 2011 18:56:50 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325216#M12217</guid>
      <dc:creator>Olivier_Grisel</dc:creator>
      <dc:date>2011-12-28T18:56:50Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325217#M12218</link>
      <description>&lt;P&gt;I ve compiled Olena 1.0 with Tesseract 3.0 with no problem&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jan 2012 05:00:30 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325217#M12218</guid>
      <dc:creator>rbahntje_Bahntj</dc:creator>
      <dc:date>2012-01-03T05:00:30Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325218#M12219</link>
      <description>&lt;P&gt;As written in the &lt;CODE&gt;README.md&lt;/CODE&gt; file and as I already answered you have to run &lt;CODE&gt;make&lt;/CODE&gt; in the &lt;CODE&gt;$SOURCE_ROOT/_build/scribo/src&lt;/CODE&gt; folder as well and the &lt;CODE&gt;content_in_doc&lt;/CODE&gt; binary will be created there too.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jan 2012 14:32:37 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325218#M12219</guid>
      <dc:creator>Olivier_Grisel</dc:creator>
      <dc:date>2012-01-03T14:32:37Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325219#M12220</link>
      <description>&lt;P&gt;I am running make inside $SOURCE_ROOT/_build/scribo/src folder&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jan 2012 05:57:25 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325219#M12220</guid>
      <dc:creator>rbahntje_Bahntj</dc:creator>
      <dc:date>2012-01-04T05:57:25Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325220#M12221</link>
      <description>&lt;P&gt;I just tried from scratch in a new empty folder from the original tarball and the &lt;CODE&gt;content_in_doc&lt;/CODE&gt; related lines in the Makefile are not commented out and the binary is built successfully. I suspect that in your case the &lt;CODE&gt;configure&lt;/CODE&gt; script did not detect some missing dependency&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jan 2012 10:23:16 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325220#M12221</guid>
      <dc:creator>Olivier_Grisel</dc:creator>
      <dc:date>2012-01-05T10:23:16Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325221#M12222</link>
      <description>&lt;P&gt;Right now I'm trying to compile Olena/content_in_doc on Debian Squeeze. I had to install the following packages to make content_in_doc enabled in Makefiles&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2012 16:49:28 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325221#M12222</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-01-06T16:49:28Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325222#M12223</link>
      <description>&lt;P&gt;In my case I built tesseract 3 from the source tarball (as not yet available in ubuntu, I don't know for debian). tesseract 3 gives much better results than tesseract 2 in practice.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2012 12:11:09 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325222#M12223</guid>
      <dc:creator>Olivier_Grisel</dc:creator>
      <dc:date>2012-01-09T12:11:09Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325223#M12224</link>
      <description>&lt;P&gt;Here I did it using Squeeze's own Tesseract.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jan 2012 10:01:31 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325223#M12224</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-01-10T10:01:31Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325224#M12225</link>
      <description>&lt;P&gt;Yet another try. Did it by using (hand-compiled) libleptonica and libtesseract (3). Apparently, Olena 2 only detects the latter when it's compiled "--with-multiple-libraries" (so that it has libtesseract_api.so and so on, and not just libtesseract.so).&lt;/P&gt;</description>
      <pubDate>Thu, 09 Feb 2012 16:34:09 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325224#M12225</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-02-09T16:34:09Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325225#M12226</link>
      <description>&lt;P&gt;Ok, finally managed to get every piece together (using Olena's git repository instead of release package, and still patching here and there).&lt;/P&gt;
&lt;P&gt;First time I imported an image, I had an error about Tesseract being unable to find language data. Right (btw : how do we specify Nuxeo what language it should use to apply OCR?). Then I added the language data, and now I don't have any information about OCR anymore, this is perfectly silent. But no annotations are created.&lt;/P&gt;
&lt;P&gt;The only thing that could be related is :&lt;/P&gt;
&lt;P&gt;2012-02-09 17:02:36,993 WARN  [it.tidalwave.image.java2d.ImplementationFactoryJ2D] JAI not available: java.lang.ClassNotFoundException: javax.media.jai.PlanarImage&lt;/P&gt;
&lt;P&gt;Any idea?&lt;/P&gt;</description>
      <pubDate>Thu, 09 Feb 2012 18:08:55 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325225#M12226</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-02-09T18:08:55Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325226#M12227</link>
      <description>&lt;P&gt;Finally I made a fresh install from scratch in an Oracle ELinux 5U7 and I can get the content_in_doc binary (I was missing the GDCM2 library) but now I am having the same issue than OlivierM, when I upload an image the server.log show this message&lt;/P&gt;</description>
      <pubDate>Fri, 24 Feb 2012 13:32:38 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325226#M12227</guid>
      <dc:creator>rbahntje_Bahntj</dc:creator>
      <dc:date>2012-02-24T13:32:38Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325227#M12228</link>
      <description>&lt;P&gt;I ve installed the JAI package ( &lt;A href="http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-java-client-419417.html#7341-JAI-1.1.2-oth-JPR" target="test_blank"&gt;http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-java-client-419417.html#7341-JAI-1.1.2-oth-JPR&lt;/A&gt; ), and copy the jai_codec.jar,  jai_core.jar and mlibwrapper_jai.jar in mi $NUXEOP_HOME/nxserver/lib&lt;/P&gt;
&lt;P&gt;Now I does not get any error messages anymore, but nothing happens when I upload an image file to Nuxeo&lt;/P&gt;
&lt;P&gt;How can I debug what is happenning?&lt;/P&gt;</description>
      <pubDate>Fri, 24 Feb 2012 15:03:57 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325227#M12228</guid>
      <dc:creator>rbahntje_Bahntj</dc:creator>
      <dc:date>2012-02-24T15:03:57Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325228#M12229</link>
      <description>&lt;P&gt;Same here. The JAI warnings disappeared (thanks for the hint!), but nothing is happening.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Feb 2012 09:57:44 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325228#M12229</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-02-27T09:57:44Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325229#M12230</link>
      <description>&lt;P&gt;Oliver, did you find a solution?&lt;/P&gt;</description>
      <pubDate>Thu, 22 Mar 2012 18:36:23 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325229#M12230</guid>
      <dc:creator>rbahntje_Bahntj</dc:creator>
      <dc:date>2012-03-22T18:36:23Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325230#M12231</link>
      <description>&lt;P&gt;Sadly no, I'm still stuck on this, and without time to investigate it further for now.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Mar 2012 12:00:01 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325230#M12231</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-03-23T12:00:01Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325231#M12232</link>
      <description>&lt;P&gt;Oliver&lt;/P&gt;
&lt;P&gt;The content_in_doc command is working fine. I try to convert an image from the commands lines and it works.&lt;/P&gt;
&lt;P&gt;When I upload an image to Nuxeo, I can see a process like this running:&lt;/P&gt;
&lt;P&gt;root     25994 25991 97 19:25 pts/0    00:00:15 content_in_doc /opt/nuxeo-cap-5.5-tomcat/tmp/cmdLineBasedConverter22108.jpg /opt/nuxeo-cap-5.5-tomcat/tmp/ocr_olena_1333236340089.xml&lt;/P&gt;
&lt;P&gt;And the file &lt;CODE&gt;ocr_olena_xxxxxxx.xml&lt;/CODE&gt; is created under $NUXEO_HOME/tmp&lt;/P&gt;
&lt;P&gt;But..... no annotations are generated in the document in Nuxeo
I will try to recompile all again&lt;/P&gt;</description>
      <pubDate>Sun, 01 Apr 2012 01:30:17 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325231#M12232</guid>
      <dc:creator>rbahntje_Bahntj</dc:creator>
      <dc:date>2012-04-01T01:30:17Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325232#M12233</link>
      <description>&lt;P&gt;Thanks to you, I just discovered ocr_olena_XX.xml files are also created in my tmp directory. Good to know.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 10:37:25 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325232#M12233</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-04-02T10:37:25Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325233#M12234</link>
      <description>&lt;P&gt;Ok, just a little thing&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 11:28:00 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325233#M12234</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-04-02T11:28:00Z</dc:date>
    </item>
    <item>
      <title>Re: Nuxeo-Platform-OCR Question</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325234#M12235</link>
      <description>&lt;P&gt;I tried to modify the UserPrincipal to an existing user, and the baseURL to my server's, but it doesn't work any better.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 17:23:58 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-platform-ocr-question/m-p/325234#M12235</guid>
      <dc:creator>OlivierM_</dc:creator>
      <dc:date>2012-04-02T17:23:58Z</dc:date>
    </item>
  </channel>
</rss>

