cancel
Showing results for 
Search instead for 
Did you mean: 

The meaning of XML extractor selector

kilo
Champ in-the-making
Champ in-the-making
Hello Gurus,

I'm trying to understand Alfresco's built-in XML meta data-extraction, which I understand requires 3 configurations:

    1. Configure the selector class ( where I set the "worker" property)
    2. Map a local variable to a content type property, where the extracted value will go to
    3. Map the local variable to XPATH expression


I'm trying to understand the design. Apparently, the selector class only peeks inside XML (for validation?). It is the XPATH extractor that does the real work. So why does selector need to be configured? Why do I need to provide root of my XPATH within selection configurations also - which I already provided during XPATH mapping to a local var.

I'm confused.  Why is there a round-about way of mapping parameter extracted from XPATH to a content property? Why do we need an intermediate mapping?

I'm trying to understand. I will appreciate any pointers.
3 REPLIES 3

derek
Star Contributor
Star Contributor
Hi,
The mimetype "XML" is really an infinitely variable document format; we can't rely on it to be anything except well-formed.  The simplest way for the extractor to know what 'type' of XML it is dealing with is to "peek" into the document.  The selector runs XPath statements until it gets a hit; it then passes the document to the corresponding XPathMetadataExctractor, which runs multiple XPath statements to extract values from the documents; the extracted values are then passed through the normal mapping phase which pushes the values into a form that will be sent for persistence.
The XmlMetadataExtracterTest extracts values from different types of xml: an Alfresco content model and an Eclipse project definition.
Regards
Derek
PS. Recent context 'subsystem' work added some extra complexity to the code.

kilo
Champ in-the-making
Champ in-the-making
Thanks, Derek. Your explanation on the intent of XML selector is very good. Does the selector process also validate (i.e. if DOCTYPE is present) the document?

I also understand why there is a two step mapping from extracted values to content property (extracted value –> local variable –> content property) . It provides an opportunity to transform extracted value before assigning it to a content property.

Thanks.

derek
Star Contributor
Star Contributor
Hi,
How strict the document builder is dependent on the parser that Java chooses at runtime:
documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
We have xercesImpl-2.8.0.jar on our classpath by default.

Regards