<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Not able to index content of large pdfs in database mysql in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86723#M26203</link>
    <description>&lt;P&gt;Cross-posting: &lt;A href="https://hub.alfresco.com/t5/alfresco-content-services-forum/increase-max-file-size-that-solr-indexes/td-p/271199" target="_blank" rel="nofollow noopener noreferrer"&gt;https://hub.alfresco.com/t5/alfresco-content-services-forum/increase-max-file-size-that-solr-indexes/td-p/271199&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 02 Oct 2020 12:53:44 GMT</pubDate>
    <dc:creator>angelborroy</dc:creator>
    <dc:date>2020-10-02T12:53:44Z</dc:date>
    <item>
      <title>Not able to index content of large pdfs in database mysql</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86722#M26202</link>
      <description>&lt;P&gt;Hey guys!&lt;BR /&gt;I am unable to index large pdf files.&lt;/P&gt;&lt;P&gt;Version Alfresco community 6.1.1&lt;BR /&gt;Ubuntu Linux 18.04&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;See the error message of file catalina.out:&lt;/P&gt;&lt;P&gt;2020-10-01 17:03:28,779 WARN [content.metadata.AbstractMappingMetadataExtracter] [http-nio-8080-exec-41] Metadata extraction rejected:&lt;BR /&gt;Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@758471b1&lt;BR /&gt;Reason: Max doc size exceeded 10.0 MB&lt;BR /&gt;2020-10-01 17:03:29,193 WARN [content.metadata.AbstractMappingMetadataExtracter] [http-nio-8080-exec-28] Metadata extraction rejected:&lt;BR /&gt;Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@758471b1&lt;BR /&gt;Reason: Max doc size exceeded 10.0 MB&lt;/P&gt;&lt;P&gt;Read the documentation on the website&lt;BR /&gt;&lt;A href="https://docs.alfresco.com/6.1/references/dev-extension-points-content-transformer.html" target="_blank" rel="nofollow noopener noreferrer"&gt;https://docs.alfresco.com/6.1/references/dev-extension-points-content-transformer.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I added in alfresco-global.properties&lt;/P&gt;&lt;P&gt;content.transformer.PdfBox.priority = 110&lt;BR /&gt;content.transformer.PdfBox.extensions.pdf.txt.priority = 50&lt;BR /&gt;content.transformer.PdfBox.extensions.pdf.txt.maxSourceSizeKBytes = 25600&lt;/P&gt;&lt;P&gt;However, it still didn't work.&lt;BR /&gt;Can you help please?&lt;BR /&gt;With best regards,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Oct 2020 11:17:26 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86722#M26202</guid>
      <dc:creator>jbrasil</dc:creator>
      <dc:date>2020-10-02T11:17:26Z</dc:date>
    </item>
    <item>
      <title>Re: Not able to index content of large pdfs in database mysql</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86723#M26203</link>
      <description>&lt;P&gt;Cross-posting: &lt;A href="https://hub.alfresco.com/t5/alfresco-content-services-forum/increase-max-file-size-that-solr-indexes/td-p/271199" target="_blank" rel="nofollow noopener noreferrer"&gt;https://hub.alfresco.com/t5/alfresco-content-services-forum/increase-max-file-size-that-solr-indexes/td-p/271199&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Oct 2020 12:53:44 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86723#M26203</guid>
      <dc:creator>angelborroy</dc:creator>
      <dc:date>2020-10-02T12:53:44Z</dc:date>
    </item>
    <item>
      <title>Re: Not able to index content of large pdfs in database mysql</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86724#M26204</link>
      <description>&lt;P&gt;Hi angelborroy,&lt;BR /&gt;I had seen that documentation.&lt;BR /&gt;I applied the parameters below, in the alfresco-global.properties&lt;BR /&gt;I am restart Alfresco service.&lt;BR /&gt;It still didn't work.&lt;BR /&gt;Can you help?&lt;BR /&gt;Thanks a lot.&lt;/P&gt;&lt;P&gt;content.transformer.default.timeoutMs=180000&lt;BR /&gt;content.transformer.default.txt.*.maxSourceSizeKBytes=1048576&lt;BR /&gt;content.transformer.JodConverter.maxSourceSizeKBytes=102400&lt;/P&gt;&lt;P&gt;log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=DEBUG&lt;/P&gt;&lt;P&gt;content.metadataExtracter.pdf.maxDocumentSizeMB=1000&lt;BR /&gt;content.metadataExtracter.default.timeoutMs=3625000&lt;/P&gt;&lt;P&gt;content.transformer.PdfBox.priority=110&lt;BR /&gt;content.transformer.PdfBox.extensions.pdf.txt.priority=50&lt;BR /&gt;content.transformer.PdfBox.extensions.pdf.txt.maxSourceSizeKBytes=25600&lt;/P&gt;&lt;P&gt;content.transformer.json2html.priority=30&lt;BR /&gt;content.transformer.json2html.extensions.json.html.supported=true&lt;BR /&gt;content.transformer.json2html.extensions.json.html.priority=30&lt;/P&gt;</description>
      <pubDate>Fri, 02 Oct 2020 13:55:07 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86724#M26204</guid>
      <dc:creator>jbrasil</dc:creator>
      <dc:date>2020-10-02T13:55:07Z</dc:date>
    </item>
    <item>
      <title>Re: Not able to index content of large pdfs in database mysql</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86725#M26205</link>
      <description>&lt;P&gt;Well, not really cross-posting as the OP is different. But the answer in the other thread is definitely spot on for a similar issue with transformers. What is not mentioned in the other thread is that the transformer config is also &lt;A href="https://docs.alfresco.com/6.2/references/dev-extension-points-content-transformer.html" target="_self" rel="nofollow noopener noreferrer"&gt;documented&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;But in this case we are talking about metadata extractors, and these have separately configured limits. In fact, the PdfBox extractor is about the only one that has a configured limit via the global property content.metadataExtracter.pdf.maxDocumentSizeMB&lt;/P&gt;</description>
      <pubDate>Fri, 02 Oct 2020 13:58:12 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/not-able-to-index-content-of-large-pdfs-in-database-mysql/m-p/86725#M26205</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2020-10-02T13:58:12Z</dc:date>
    </item>
  </channel>
</rss>

