cancel
Showing results for 
Search instead for 
Did you mean: 

Nuxeo-Platform-OCR Question

Soni_
Champ on-the-rise
Champ on-the-rise

Hi:

I'm trying to install 'Nuxeo-platform-ocr' (https://github.com/nuxeo/nuxeo-platform-ocr) , but I do not know where to locate the file 'content_in_doc', so that Nuxeo can use to analyze.

I have followed this manual https://github.com/nuxeo/nuxeo-platform-ocr, but not clear where to locate.

I'm using Ubuntu 10.11 + Tesseract + 3 + Nuxeo Olena (scribe)

Could you tell me where I locate the file 'content_in_doc'?

Thanks, and regards.

30 REPLIES 30

OlivierM_
Star Contributor
Star Contributor

Ok, finally managed to get every piece together (using Olena's git repository instead of release package, and still patching here and there).

First time I imported an image, I had an error about Tesseract being unable to find language data. Right (btw : how do we specify Nuxeo what language it should use to apply OCR?). Then I added the language data, and now I don't have any information about OCR anymore, this is perfectly silent. But no annotations are created.

The only thing that could be related is :

2012-02-09 17:02:36,993 WARN [it.tidalwave.image.java2d.ImplementationFactoryJ2D] JAI not available: java.lang.ClassNotFoundException: javax.media.jai.PlanarImage

Any idea?

Finally I made a fresh install from scratch in an Oracle ELinux 5U7 and I can get the content_in_doc binary (I was missing the GDCM2 library) but now I am having the same issue than OlivierM, when I upload an image the server.log show this message

rbahntje_Bahntj
Confirmed Champ
Confirmed Champ

I ve installed the JAI package ( http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-java-client-4... ), and copy the jai_codec.jar, jai_core.jar and mlibwrapper_jai.jar in mi $NUXEOP_HOME/nxserver/lib

Now I does not get any error messages anymore, but nothing happens when I upload an image file to Nuxeo

How can I debug what is happenning?

Same here. The JAI warnings disappeared (thanks for the hint!), but nothing is happening.

Oliver, did you find a solution?

Sadly no, I'm still stuck on this, and without time to investigate it further for now.

rbahntje_Bahntj
Confirmed Champ
Confirmed Champ

Oliver

The content_in_doc command is working fine. I try to convert an image from the commands lines and it works.

When I upload an image to Nuxeo, I can see a process like this running:

root 25994 25991 97 19:25 pts/0 00:00:15 content_in_doc /opt/nuxeo-cap-5.5-tomcat/tmp/cmdLineBasedConverter22108.jpg /opt/nuxeo-cap-5.5-tomcat/tmp/ocr_olena_1333236340089.xml

And the file ocr_olena_xxxxxxx.xml is created under $NUXEO_HOME/tmp

But..... no annotations are generated in the document in Nuxeo I will try to recompile all again

Thanks to you, I just discovered ocr_olena_XX.xml files are also created in my tmp directory. Good to know.

Ok, just a little thing

I tried to modify the UserPrincipal to an existing user, and the baseURL to my server's, but it doesn't work any better.