<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The meaning of XML extractor selector in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236965#M190095</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hello Gurus,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I'm trying to understand Alfresco's built-in XML meta data-extraction, which I understand requires 3 configurations:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;UL&gt;1. Configure the selector class ( where I set the "worker" property)&lt;BR /&gt;2. Map a local variable to a content type property, where the extracted value will go to&lt;BR /&gt;3. Map the local variable to XPATH expression&lt;/UL&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I'm trying to understand the design. Apparently, the selector class only peeks inside XML (for validation?). It is the XPATH extractor that does the real work. So why does selector need to be configured? Why do I need to provide root of my XPATH within selection configurations also - which I already provided during XPATH mapping to a local var.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I'm confused.&amp;nbsp; Why is there a round-about way of mapping parameter extracted from XPATH to a content property? Why do we need an intermediate mapping?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I'm trying to understand. I will appreciate any pointers.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 29 Jan 2010 23:26:56 GMT</pubDate>
    <dc:creator>kilo</dc:creator>
    <dc:date>2010-01-29T23:26:56Z</dc:date>
    <item>
      <title>The meaning of XML extractor selector</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236965#M190095</link>
      <description>Hello Gurus,I'm trying to understand Alfresco's built-in XML meta data-extraction, which I understand requires 3 configurations:1. Configure the selector class ( where I set the "worker" property)2. Map a local variable to a content type property, where the extracted value will go to3. Map the local</description>
      <pubDate>Fri, 29 Jan 2010 23:26:56 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236965#M190095</guid>
      <dc:creator>kilo</dc:creator>
      <dc:date>2010-01-29T23:26:56Z</dc:date>
    </item>
    <item>
      <title>Re: The meaning of XML extractor selector</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236966#M190096</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;The mimetype "XML" is really an infinitely variable document format; we can't rely on it to be anything except well-formed.&amp;nbsp; The simplest way for the extractor to know what 'type' of XML it is dealing with is to "peek" into the document.&amp;nbsp; The selector runs XPath statements until it gets a hit; it then passes the document to the corresponding XPathMetadataExctractor, which runs multiple XPath statements to extract values from the documents; the extracted values are then passed through the normal mapping phase which pushes the values into a form that will be sent for persistence.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;The &lt;/SPAN&gt;&lt;STRONG&gt;XmlMetadataExtracterTest&lt;/STRONG&gt;&lt;SPAN&gt; extracts values from different types of xml: an Alfresco content model and an Eclipse project definition.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Derek&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;PS. Recent context 'subsystem' work added some extra complexity to the code.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Feb 2010 16:26:42 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236966#M190096</guid>
      <dc:creator>derek</dc:creator>
      <dc:date>2010-02-01T16:26:42Z</dc:date>
    </item>
    <item>
      <title>Re: The meaning of XML extractor selector</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236967#M190097</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Thanks, Derek. Your explanation on the intent of XML selector is very good. Does the selector process also validate (i.e. if DOCTYPE is present) the document?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I also understand why there is a two step mapping from extracted values to content property (extracted value –&amp;gt; local variable –&amp;gt; content property) . It provides an opportunity to transform extracted value before assigning it to a content property. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Feb 2010 16:19:25 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236967#M190097</guid>
      <dc:creator>kilo</dc:creator>
      <dc:date>2010-02-02T16:19:25Z</dc:date>
    </item>
    <item>
      <title>Re: The meaning of XML extractor selector</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236968#M190098</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;How strict the document builder is dependent on the parser that Java chooses at runtime: &lt;/SPAN&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt;We have xercesImpl-2.8.0.jar on our classpath by default.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 03 Feb 2010 11:41:59 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/the-meaning-of-xml-extractor-selector/m-p/236968#M190098</guid>
      <dc:creator>derek</dc:creator>
      <dc:date>2010-02-03T11:41:59Z</dc:date>
    </item>
  </channel>
</rss>

