09-15-2011 06:37 AM
I have installed the nuxeo-plattform-ocr plugin ( https://github.com/nuxeo/nuxeo-platform-ocr#readme ) and is working very nice, but I am not able to run the OCR inside image PDFs.
Is there any plugin to do this?
Regards
Ruben Bahntje Ushuaia - Argentina
09-15-2011 02:37 PM
Great to learn that you could install this addon successfully despite the list of non trivial dependencies to build from source 🙂
To make it work on PDF files it would require to first extract the image files (e.g. JPEG files) included inside. If you are a Java developer, this should be doable with the http://pdfbox.apache.org/ , e.g. you can take class from the PDFBox source tree as an example.
The source code of the OCR plugin is not too complicated to dive into and I can probably assist you on the nuxeo-dev mailing list or better directly through the inline review system on pull request directly on github.
09-15-2011 02:37 PM
Great to learn that you could install this addon successfully despite the list of non trivial dependencies to build from source 🙂
To make it work on PDF files it would require to first extract the image files (e.g. JPEG files) included inside. If you are a Java developer, this should be doable with the http://pdfbox.apache.org/ , e.g. you can take class from the PDFBox source tree as an example.
The source code of the OCR plugin is not too complicated to dive into and I can probably assist you on the nuxeo-dev mailing list or better directly through the inline review system on pull request directly on github.
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.