12-12-2011 12:24 PM
Hi:
I'm trying to install 'Nuxeo-platform-ocr' (https://github.com/nuxeo/nuxeo-platform-ocr) , but I do not know where to locate the file 'content_in_doc', so that Nuxeo can use to analyze.
I have followed this manual https://github.com/nuxeo/nuxeo-platform-ocr, but not clear where to locate.
I'm using Ubuntu 10.11 + Tesseract + 3 + Nuxeo Olena (scribe)
Could you tell me where I locate the file 'content_in_doc'?
Thanks, and regards.
12-28-2011 01:56 PM
I just tried to build against the latest stable version (2.0) of Olena and it seems to work fine. I have updated the README.md of nuxeo-platform-ocr
to point to the right source archive.
Beware that the build of olena is has several steps and 2 calls to make in 2 separate folders (the build root and the scribo/src
subfolder):
$ wget http://www.lrde.epita.fr/dload/olena/2.0/olena-2.0.tar.bz2
$ tar jxvf olena-*.tar.bz2
$ cd olena-2.0/
$ mkdir _build
$ cd _build
$ ../configure && make
$ cd scribo/src
$ make
The scribo/src
should then hold the content_in_doc
binary. If not check any error messages in the output the build. Maybe your are missing the development headers for tesseract? Have you installed tesseract 3 from the source tarball and installed it system-wide using sudo make install
?
01-03-2012 12:00 AM
I ve compiled Olena 1.0 with Tesseract 3.0 with no problem
01-03-2012 09:32 AM
As written in the README.md
file and as I already answered you have to run make
in the $SOURCE_ROOT/_build/scribo/src
folder as well and the content_in_doc
binary will be created there too.
01-04-2012 12:57 AM
I am running make inside $SOURCE_ROOT/_build/scribo/src folder
01-05-2012 05:23 AM
I just tried from scratch in a new empty folder from the original tarball and the content_in_doc
related lines in the Makefile are not commented out and the binary is built successfully. I suspect that in your case the configure
script did not detect some missing dependency
01-06-2012 11:49 AM
Right now I'm trying to compile Olena/content_in_doc on Debian Squeeze. I had to install the following packages to make content_in_doc enabled in Makefiles
01-09-2012 07:11 AM
In my case I built tesseract 3 from the source tarball (as not yet available in ubuntu, I don't know for debian). tesseract 3 gives much better results than tesseract 2 in practice.
01-10-2012 05:01 AM
Here I did it using Squeeze's own Tesseract.
02-09-2012 11:34 AM
Yet another try. Did it by using (hand-compiled) libleptonica and libtesseract (3). Apparently, Olena 2 only detects the latter when it's compiled "--with-multiple-libraries" (so that it has libtesseract_api.so and so on, and not just libtesseract.so).
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.