12-12-2011 12:24 PM
Hi:
I'm trying to install 'Nuxeo-platform-ocr' (https://github.com/nuxeo/nuxeo-platform-ocr) , but I do not know where to locate the file 'content_in_doc', so that Nuxeo can use to analyze.
I have followed this manual https://github.com/nuxeo/nuxeo-platform-ocr, but not clear where to locate.
I'm using Ubuntu 10.11 + Tesseract + 3 + Nuxeo Olena (scribe)
Could you tell me where I locate the file 'content_in_doc'?
Thanks, and regards.
02-09-2012 01:08 PM
Ok, finally managed to get every piece together (using Olena's git repository instead of release package, and still patching here and there).
First time I imported an image, I had an error about Tesseract being unable to find language data. Right (btw : how do we specify Nuxeo what language it should use to apply OCR?). Then I added the language data, and now I don't have any information about OCR anymore, this is perfectly silent. But no annotations are created.
The only thing that could be related is :
2012-02-09 17:02:36,993 WARN [it.tidalwave.image.java2d.ImplementationFactoryJ2D] JAI not available: java.lang.ClassNotFoundException: javax.media.jai.PlanarImage
Any idea?
02-24-2012 08:32 AM
Finally I made a fresh install from scratch in an Oracle ELinux 5U7 and I can get the content_in_doc binary (I was missing the GDCM2 library) but now I am having the same issue than OlivierM, when I upload an image the server.log show this message
02-24-2012 10:03 AM
I ve installed the JAI package ( http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-java-client-4... ), and copy the jai_codec.jar, jai_core.jar and mlibwrapper_jai.jar in mi $NUXEOP_HOME/nxserver/lib
Now I does not get any error messages anymore, but nothing happens when I upload an image file to Nuxeo
How can I debug what is happenning?
02-27-2012 04:57 AM
Same here. The JAI warnings disappeared (thanks for the hint!), but nothing is happening.
03-22-2012 02:36 PM
Oliver, did you find a solution?
03-23-2012 08:00 AM
Sadly no, I'm still stuck on this, and without time to investigate it further for now.
03-31-2012 09:30 PM
Oliver
The content_in_doc command is working fine. I try to convert an image from the commands lines and it works.
When I upload an image to Nuxeo, I can see a process like this running:
root 25994 25991 97 19:25 pts/0 00:00:15 content_in_doc /opt/nuxeo-cap-5.5-tomcat/tmp/cmdLineBasedConverter22108.jpg /opt/nuxeo-cap-5.5-tomcat/tmp/ocr_olena_1333236340089.xml
And the file ocr_olena_xxxxxxx.xml
is created under $NUXEO_HOME/tmp
But..... no annotations are generated in the document in Nuxeo I will try to recompile all again
04-02-2012 06:37 AM
Thanks to you, I just discovered ocr_olena_XX.xml files are also created in my tmp directory. Good to know.
04-02-2012 07:28 AM
Ok, just a little thing
04-02-2012 01:23 PM
I tried to modify the UserPrincipal to an existing user, and the baseURL to my server's, but it doesn't work any better.
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.