<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: nuxeo-plattform-ocr and image pdfs in Nuxeo Forum</title>
    <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-plattform-ocr-and-image-pdfs/m-p/315935#M2936</link>
    <description>&lt;P&gt;Great to learn that you could install this addon successfully despite the list of non trivial dependencies to build from source &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;To make it work on PDF files it would require to first extract the image files (e.g. JPEG files) included inside. If you are a Java developer, this should be doable with the &lt;A href="http://pdfbox.apache.org/" target="test_blank"&gt;http://pdfbox.apache.org/&lt;/A&gt; , e.g. you can take &lt;A href="http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/ExtractImages.java"&gt;class from the PDFBox source tree&lt;/A&gt; as an example.&lt;/P&gt;
&lt;P&gt;The source code of the OCR plugin is not too complicated to dive into and I can probably assist you on the nuxeo-dev mailing list or better directly through the inline review system on pull request directly on github.&lt;/P&gt;</description>
    <pubDate>Thu, 15 Sep 2011 18:37:59 GMT</pubDate>
    <dc:creator>Olivier_Grisel</dc:creator>
    <dc:date>2011-09-15T18:37:59Z</dc:date>
    <item>
      <title>nuxeo-plattform-ocr and image pdfs</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-plattform-ocr-and-image-pdfs/m-p/315934#M2935</link>
      <description>&lt;P&gt;I have installed the nuxeo-plattform-ocr plugin ( &lt;A href="https://github.com/nuxeo/nuxeo-platform-ocr#readme" target="test_blank"&gt;https://github.com/nuxeo/nuxeo-platform-ocr#readme&lt;/A&gt; ) and is working very nice, but I am not able to run the OCR inside image PDFs.&lt;/P&gt;
&lt;P&gt;Is there any plugin to do this?&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;Ruben Bahntje
Ushuaia - Argentina&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2011 10:37:39 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-plattform-ocr-and-image-pdfs/m-p/315934#M2935</guid>
      <dc:creator>rbahntje_Bahntj</dc:creator>
      <dc:date>2011-09-15T10:37:39Z</dc:date>
    </item>
    <item>
      <title>Re: nuxeo-plattform-ocr and image pdfs</title>
      <link>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-plattform-ocr-and-image-pdfs/m-p/315935#M2936</link>
      <description>&lt;P&gt;Great to learn that you could install this addon successfully despite the list of non trivial dependencies to build from source &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;To make it work on PDF files it would require to first extract the image files (e.g. JPEG files) included inside. If you are a Java developer, this should be doable with the &lt;A href="http://pdfbox.apache.org/" target="test_blank"&gt;http://pdfbox.apache.org/&lt;/A&gt; , e.g. you can take &lt;A href="http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/ExtractImages.java"&gt;class from the PDFBox source tree&lt;/A&gt; as an example.&lt;/P&gt;
&lt;P&gt;The source code of the OCR plugin is not too complicated to dive into and I can probably assist you on the nuxeo-dev mailing list or better directly through the inline review system on pull request directly on github.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2011 18:37:59 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/nuxeo-forum/nuxeo-plattform-ocr-and-image-pdfs/m-p/315935#M2936</guid>
      <dc:creator>Olivier_Grisel</dc:creator>
      <dc:date>2011-09-15T18:37:59Z</dc:date>
    </item>
  </channel>
</rss>

