Increase Max File Size That Solr Indexes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-24-2019 01:11 AM
Hello everyone,
I have installed Alfresco Community Edition Vers 5.2 on windows (using exe file). As I noticed in my log file, when I upload a PDF file larger than 10 MB, the Alfresco (Solr) is not extracting its text and therefore the file content can not be searched. The log file says:
Metadata extraction rejected, Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@39882d66 Reason: Max doc size exceeded 10 MB.
I would appreciate it if someone could tell me how can I increase this size. I have already tried some solutions (for example increasing alfresco.contentStreamLimit located in file alfresco-community/solr4/archive-SpaceStore/conf/solrcore and alfresco-community/solr4/workspace-SpaceStore/conf/solrcore)
Thanks a lot in advance.
- Labels:
-
Alfresco Content Services
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-24-2019 02:13 PM
The limitation is defined in your Alfresco repository which converts the pdf to text. Please check your transformer configuration which is by default defined in alfresco-ce-repository/transformers.properties at 5.2.g-patched · ecm4u/alfresco-ce-repository · Git... (sorry I didn't find a valid tag in the Alfresco git repo for 5.2).
Depending on the transformer which takes the task you should increase the maxSourceSizeKBytes.
e.g.
content.transformer.PdfBox.extensions.pdf.txt.maxSourceSizeKBytes=25600
and set debuggin in your log4j properties
log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=DEBUG
to find out which transformer actually is running for your documents and/or install GitHub - OrderOfTheBee/ootbee-support-tools: OOTBee Support Tools addon to extend set of administrat... to debug and modify transformation config from your browser.
