<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic content transformer and pdfbox in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280395#M233525</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi all,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;We're running alfresco 4.2e, and facing a very high CPU load. After&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;some investigation, we found that transformer.PdfBox is causing this anormal load.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;It seems that pdfbox can be upgraded. Could somebody explain me how to do that ?&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Is pdfbox.jar embedded in alfresco ? Do we need to recompile the whole project ?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Any other suggestion ?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Vincent&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 26 Jun 2014 20:56:26 GMT</pubDate>
    <dc:creator>vincent-kali</dc:creator>
    <dc:date>2014-06-26T20:56:26Z</dc:date>
    <item>
      <title>content transformer and pdfbox</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280395#M233525</link>
      <description>Hi all,We're running alfresco 4.2e, and facing a very high CPU load. Aftersome investigation, we found that transformer.PdfBox is causing this anormal load.It seems that pdfbox can be upgraded. Could somebody explain me how to do that ?Is pdfbox.jar embedded in alfresco ? Do we need to recompile the</description>
      <pubDate>Thu, 26 Jun 2014 20:56:26 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280395#M233525</guid>
      <dc:creator>vincent-kali</dc:creator>
      <dc:date>2014-06-26T20:56:26Z</dc:date>
    </item>
    <item>
      <title>Re: content transformer and pdfbox</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280396#M233526</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Alfresco 4.2e uses PDFBox 1.8.2, while the new 5.0a uses PDFBox 1.8.4, so one easy option is just to upgrade to a newer Alfresco release!&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Otherwise, to upgrade Apache PDFBox you generally need to upgrade Apache Tika too, and that means upgrading some of the other dependencies as well. For extra fun, Alfresco are currently shipping custom patched versions of Apache Tika, so you might be better off grabbing the newer Tika + friends jars out of 5.0.a or HEAD rather than trying to find + replace the jars yourself. For the list of the dependency jars, you'll want to look in the Tika Core and Tika Parser poms. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I'd probably suggest the upgrade to 5.0.a, unless you have very strong reasons to stick with 4.2&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 29 Jun 2014 18:18:35 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280396#M233526</guid>
      <dc:creator>nickburch</dc:creator>
      <dc:date>2014-06-29T18:18:35Z</dc:date>
    </item>
    <item>
      <title>Re: content transformer and pdfbox</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280397#M233527</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi nickburch,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks for your feedback… this doesn't look straightforward. &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Intially we had two major issues using PDFBox 1.8.2:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;- High CPU usage&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;- some parts of pdf string tables are not converted to text, and then not indexed by lucene.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Any advice ?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Vincent&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Jun 2014 14:21:25 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280397#M233527</guid>
      <dc:creator>vincent-kali</dc:creator>
      <dc:date>2014-06-30T14:21:25Z</dc:date>
    </item>
    <item>
      <title>Re: content transformer and pdfbox</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280398#M233528</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Has anybody tried to replace the jars in 4.2.e/f? Is it worth to test?&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It's really painfull to get block whole alfresco again and again.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;s. also &lt;/SPAN&gt;&lt;BR /&gt;&lt;A href="https://issues.alfresco.com/jira/browse/MNT-11350" rel="nofollow noopener noreferrer"&gt;https://issues.alfresco.com/jira/browse/MNT-11350&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://issues.alfresco.com/jira/browse/MNT-11666" rel="nofollow noopener noreferrer"&gt;https://issues.alfresco.com/jira/browse/MNT-11666&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://issues.apache.org/jira/browse/PDFBOX-1585" rel="nofollow noopener noreferrer"&gt;https://issues.apache.org/jira/browse/PDFBOX-1585&lt;/A&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 11 Sep 2014 17:15:22 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/content-transformer-and-pdfbox/m-p/280398#M233528</guid>
      <dc:creator>heiko_robert</dc:creator>
      <dc:date>2014-09-11T17:15:22Z</dc:date>
    </item>
  </channel>
</rss>

