cancel
Showing results for 
Search instead for 
Did you mean: 

nuxeo-plattform-ocr and image pdfs

rbahntje_Bahntj
Confirmed Champ
Confirmed Champ

I have installed the nuxeo-plattform-ocr plugin ( https://github.com/nuxeo/nuxeo-platform-ocr#readme ) and is working very nice, but I am not able to run the OCR inside image PDFs.

Is there any plugin to do this?

Regards

Ruben Bahntje Ushuaia - Argentina

1 ACCEPTED ANSWER

Olivier_Grisel
Star Contributor
Star Contributor

Great to learn that you could install this addon successfully despite the list of non trivial dependencies to build from source 🙂

To make it work on PDF files it would require to first extract the image files (e.g. JPEG files) included inside. If you are a Java developer, this should be doable with the http://pdfbox.apache.org/ , e.g. you can take class from the PDFBox source tree as an example.

The source code of the OCR plugin is not too complicated to dive into and I can probably assist you on the nuxeo-dev mailing list or better directly through the inline review system on pull request directly on github.

View answer in original post

1 REPLY 1

Olivier_Grisel
Star Contributor
Star Contributor

Great to learn that you could install this addon successfully despite the list of non trivial dependencies to build from source 🙂

To make it work on PDF files it would require to first extract the image files (e.g. JPEG files) included inside. If you are a Java developer, this should be doable with the http://pdfbox.apache.org/ , e.g. you can take class from the PDFBox source tree as an example.

The source code of the OCR plugin is not too complicated to dive into and I can probably assist you on the nuxeo-dev mailing list or better directly through the inline review system on pull request directly on github.