Hi,
The mimetype "XML" is really an infinitely variable document format; we can't rely on it to be anything except well-formed. The simplest way for the extractor to know what 'type' of XML it is dealing with is to "peek" into the document. The selector runs XPath statements until it gets a hit; it then passes the document to the corresponding XPathMetadataExctractor, which runs multiple XPath statements to extract values from the documents; the extracted values are then passed through the normal mapping phase which pushes the values into a form that will be sent for persistence.
The XmlMetadataExtracterTest extracts values from different types of xml: an Alfresco content model and an Eclipse project definition.
Regards
Derek
PS. Recent context 'subsystem' work added some extra complexity to the code.