cancel
Showing results for 
Search instead for 
Did you mean: 

Increase Max File Size That Solr Indexes

alinasrinazif
Champ in-the-making
Champ in-the-making

Hello everyone,

I have installed Alfresco Community Edition Vers 5.2 on windows (using exe file). As I noticed in my log file, when I upload a PDF file larger than 10 MB, the Alfresco (Solr) is not extracting its text and therefore the file content can not be searched. The log file says:

Metadata extraction rejected, Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@39882d66 Reason: Max doc size exceeded 10 MB.

I would appreciate it if someone could tell me how can I increase this size. I have already tried some solutions (for example increasing alfresco.contentStreamLimit located in file alfresco-community/solr4/archive-SpaceStore/conf/solrcore and alfresco-community/solr4/workspace-SpaceStore/conf/solrcore)

Thanks a lot in advance.

1 REPLY 1

heiko_robert
Star Collaborator
Star Collaborator

The limitation is defined in your Alfresco repository which converts the pdf to text. Please check your transformer configuration which is by default defined in alfresco-ce-repository/transformers.properties at 5.2.g-patched · ecm4u/alfresco-ce-repository · Git... (sorry I didn't find a valid tag in the Alfresco git repo for 5.2).

Depending on the transformer which takes the task you should increase the maxSourceSizeKBytes.

e.g.

content.transformer.PdfBox.extensions.pdf.txt.maxSourceSizeKBytes=25600

and set debuggin in your log4j properties

log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=DEBUG

to find out which transformer actually is running for your documents and/or install GitHub - OrderOfTheBee/ootbee-support-tools: OOTBee Support Tools addon to extend set of administrat... to debug and modify transformation config  from your browser.