<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Indexing XML on Alfresco 5.1.x in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/indexing-xml-on-alfresco-5-1-x/m-p/77959#M24439</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Answering to myself&lt;/P&gt;&lt;P&gt;The issue actually does not come from the indexing but from the extraction. It seems that text/xml mimetype is handled by a String extractor outputing the same in output as what it gets in input. Therefore, the whole XML goes to the indexing.&lt;/P&gt;&lt;P&gt;The solution was to create a custom extractor stripping out XML syntax (similar to HTML extraction) and to use a custom application/xml mimetype to trigger it&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 07 Sep 2018 10:00:27 GMT</pubDate>
    <dc:creator>pcuvecle2</dc:creator>
    <dc:date>2018-09-07T10:00:27Z</dc:date>
    <item>
      <title>Indexing XML on Alfresco 5.1.x</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/indexing-xml-on-alfresco-5-1-x/m-p/77958#M24438</link>
      <description>Hi,I am using Alfresco 5.1 and I have XML files to index. My XML contains tags such as&amp;lt;paragraph eId="id-00000967-2e30-ecab-ad49-685fecd94436"&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;content&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;p&amp;gt;Some text&amp;lt;/p&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/content&amp;gt;&amp;lt;/paragraph&amp;gt;I would like to be able to discard XML attribute such as eId d</description>
      <pubDate>Mon, 30 Jul 2018 14:37:52 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/indexing-xml-on-alfresco-5-1-x/m-p/77958#M24438</guid>
      <dc:creator>pcuvecle2</dc:creator>
      <dc:date>2018-07-30T14:37:52Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing XML on Alfresco 5.1.x</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/indexing-xml-on-alfresco-5-1-x/m-p/77959#M24439</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Answering to myself&lt;/P&gt;&lt;P&gt;The issue actually does not come from the indexing but from the extraction. It seems that text/xml mimetype is handled by a String extractor outputing the same in output as what it gets in input. Therefore, the whole XML goes to the indexing.&lt;/P&gt;&lt;P&gt;The solution was to create a custom extractor stripping out XML syntax (similar to HTML extraction) and to use a custom application/xml mimetype to trigger it&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 07 Sep 2018 10:00:27 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/indexing-xml-on-alfresco-5-1-x/m-p/77959#M24439</guid>
      <dc:creator>pcuvecle2</dc:creator>
      <dc:date>2018-09-07T10:00:27Z</dc:date>
    </item>
  </channel>
</rss>

