07-24-2019 01:11 AM
Hello everyone,
I have installed Alfresco Community Edition Vers 5.2 on windows (using exe file). As I noticed in my log file, when I upload a PDF file larger than 10 MB, the Alfresco (Solr) is not extracting its text and therefore the file content can not be searched. The log file says:
Metadata extraction rejected, Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@39882d66 Reason: Max doc size exceeded 10 MB.
I would appreciate it if someone could tell me how can I increase this size. I have already tried some solutions (for example increasing alfresco.contentStreamLimit located in file alfresco-community/solr4/archive-SpaceStore/conf/solrcore and alfresco-community/solr4/workspace-SpaceStore/conf/solrcore)
Thanks a lot in advance.
07-24-2019 02:13 PM
The limitation is defined in your Alfresco repository which converts the pdf to text. Please check your transformer configuration which is by default defined in alfresco-ce-repository/transformers.properties at 5.2.g-patched · ecm4u/alfresco-ce-repository · Git... (sorry I didn't find a valid tag in the Alfresco git repo for 5.2).
Depending on the transformer which takes the task you should increase the maxSourceSizeKBytes.
e.g.
content.transformer.PdfBox.extensions.pdf.txt.maxSourceSizeKBytes=25600
and set debuggin in your log4j properties
log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=DEBUG
to find out which transformer actually is running for your documents and/or install GitHub - OrderOfTheBee/ootbee-support-tools: OOTBee Support Tools addon to extend set of administrat... to debug and modify transformation config from your browser.
Explore our Alfresco products with the links below. Use labels to filter content by product module.