Indexing XML on Alfresco 5.1.x

pcuvecle2 — Mon, 30 Jul 2018 14:37:52 GMT

Hi,I am using Alfresco 5.1 and I have XML files to index. My XML contains tags such as<paragraph eId="id-00000967-2e30-ecab-ad49-685fecd94436"> <content> <p>Some text</p> </content></paragraph>I would like to be able to discard XML attribute such as eId d

Re: Indexing XML on Alfresco 5.1.x

pcuvecle2 — Fri, 07 Sep 2018 10:00:27 GMT

Answering to myself

The issue actually does not come from the indexing but from the extraction. It seems that text/xml mimetype is handled by a String extractor outputing the same in output as what it gets in input. Therefore, the whole XML goes to the indexing.

The solution was to create a custom extractor stripping out XML syntax (similar to HTML extraction) and to use a custom application/xml mimetype to trigger it

topic Re: Indexing XML on Alfresco 5.1.x in Alfresco Forum

Indexing XML on Alfresco 5.1.x

Re: Indexing XML on Alfresco 5.1.x