cancel
Showing results for 
Search instead for 
Did you mean: 

Metadata Extraction requiring XPATH 2.0 expression

pmahoney
Champ in-the-making
Champ in-the-making
I have been using XPathMetadataExtracter very successfully but have a new requirment that requires the use of an XPATH 2.0 expression. In short I need to extract the minimum date from a nodeset. Here's the expression:

     min(/CabinBankCruiseItinerary//Sailing/(@Start cast as xs:date))

I've testing this expression against a document using XMLSpy, so I know it is valid XPATH V2.0 expression. But it seems Alfresco is not 2.0 capable.

    PropertyAccessException 1: org.springframework.beans.MethodInvocationException:
    Property 'xpathMappingProperties' threw exception; nested exception is org.alfresco.error.AlfrescoRuntimeException: 09160000
    Failed to create XPath expression:
       Document property: firstSailing
       XPath:             min(/CabinBankCruiseItinerary//Sailing/(@Start cast as xs:date))

Can anyone confirm this and/or tell me what I need to do to make Alfresco XPATH 2.0 cabable?
4 REPLIES 4

derek
Star Contributor
Star Contributor
Hi,
You need to take a look at the rest of the exception and see what XPath-supporting library is not working with your expression.

pmahoney
Champ in-the-making
Champ in-the-making
Hi

I have managed to get Alfresco to load now. I did this by replacing the xalan.jar in tomcat/endorsed with saxon9he.jar which does support XPATH 2.0. So now it does compile my xpaths, but… no matter how hard I try the 'extract metadata' action does nothing. The breakpoints I normally use to debug this area do not trigger. It's like XML extraction is disabled. Nothing obvious appears in the logs but I have noticed the following difference:

When booting with xalan in tomcat/endorsed:

18:10:47,453  INFO  [domain.schema.SchemaBootstrap] Schema managed by database dialect org.hibernate.dialect.PostgreSQLDialect.
18:10:54,718  ERROR [domain.schema.SchemaBootstrap] Failed to dump normalized, pre-upgrade schema to file.  Error: java.io.FileNotFoundException: C:\Program%20Files\Apache%20Software%20Foundation\Tomcat%206.0\temp\Alfresco\AlfrescoSchema-PostgreSQLDialect-39921-Startup.xml (The system cannot find the path specified)
18:10:59,312  INFO  [domain.schema.SchemaBootstrap] No changes were made to the schema.
18:11:05,875  ERROR [domain.schema.SchemaBootstrap] Failed to dump normalized, post-upgrade schema to file.  Error: java.io.FileNotFoundException: C:\Program%20Files\Apache%20Software%20Foundation\Tomcat%206.0\temp\Alfresco\AlfrescoSchema-PostgreSQLDialect-39923.xml (The system cannot find the path specified)

I don't know what this error means, but it does not seem to cause any problems, and my simple XML metadata extractions work.

When booting with saxon9he in tomcat/endorsed: (having first removed my xpath 2.0 expressions from the config)

17:54:31,421  INFO  [domain.schema.SchemaBootstrap] Schema managed by database dialect org.hibernate.dialect.PostgreSQLDialect.
17:54:38,546  ERROR [domain.schema.SchemaBootstrap] Failed to dump normalized, pre-upgrade schema to file.  Error: Parser configuration problem: namespace reporting is not enabled
17:54:42,984  INFO  [domain.schema.SchemaBootstrap] No changes were made to the schema.
17:54:49,562  ERROR [domain.schema.SchemaBootstrap] Failed to dump normalized, post-upgrade schema to file.  Error: Parser configuration problem: namespace reporting is not enabled
Same place in the start-up log, but a difference error is reported, and no XML metadata extraction action works.

So replacing xalan.jar with saxon9he.jar in the tomcat/endorsed folder allows Alfresco to start (as it can complile my xpath 2.0 expressions) but somehow prevents any XML metadata extraction.

Very baffling. Any clues where to look now?

Cheers

Paul

derek
Star Contributor
Star Contributor
Painful, but watch the log for
log4j.logger.org.alfresco.repo.content.metadata=DEBUG
This should tell you which extractors are being registered, the raw values being passed in and out, etc.

pmahoney
Champ in-the-making
Champ in-the-making
I was using org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter

I now have a solution I would like to share with the community  🙂

It was somewhat nieve to think I could drop an XPATH 2.0 capable jar in place of the JAXP default or XALAN. The nature of XPATH 1.0 and and 2.0 make for many incompatabilites. It was fortunate Alfresco did not complain and tried to keep going really.

So instead I downloaded the saxon9he open-source jar and placed it in my tomcat/lib directory. Here it is availabe to my extensions (and Alfresco) but will not be picked up as the endorsed JAXP implementation. I then made a slightly modified version of XPathMetadataExtracter I called XPath2MetadataExtracter. (Code below) The only real difference it that it explicitly references the Saxon9 XPathFactoryImpl class which can handle XPATH 2.0 syntax.


/*
* Copyright (C) 2005-2010 Alfresco Software Limited.
*
* This file is part of Alfresco
*
* Alfresco is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Alfresco is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with Alfresco. If not, see <http://www.gnu.org/licenses/>.
*/

/*
* X-ACT (no copyright) Modified alfresco class to support XPATH 2.0 over XPATH 1.0 statements
* Uses SAXON9 Home edition open source
*/

package com.cabinbank.cabinBank.alfresco;

import java.io.IOException;
import java.io.InputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.Set;

import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import net.sf.saxon.xpath.XPathFactoryImpl;

import org.alfresco.error.AlfrescoRuntimeException;
import org.alfresco.repo.content.MimetypeMap;
import org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter;
import org.alfresco.service.cmr.repository.ContentReader;
import org.alfresco.service.namespace.QName;
import org.springframework.extensions.surf.util.ParameterCheck;
import org.alfresco.util.PropertyCheck;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

/**
* An extractor that pulls values from XML documents using configurable XPath 2.0
* statements.  It is not possible to list a default set of mappings - this is
* down to the configuration only.
* <p>
* When an instance of this extractor is configured, XPath statements should be
* provided to extract all the available metadata.  The implementation is sensitive
* to what is actually requested by the
* {@linkplain AbstractMappingMetadataExtracter#setMapping(Map) configured mapping}
* and will only perform the queries necessary to fulfil the requirements.
* <p>
* To summarise, there are two configurations required for this class:
* <ul>
*   <li>
*     A mapping of all reasonable document properties to XPath statements.
*     See {@link AbstractMappingMetadataExtracter#setMappingProperties(java.util.Properties)}.
*   </li>
*   <li>
*     A mapping of document property names to Alfresco repository model QNames.
*     See {@link #setXPathMappingProperties(Properties).}
*   </li>
* </ul>
* <p>
* All values are extracted as text values and therefore all XPath statements must evaluate to a node
* that can be rendered as text.
*
* @see AbstractMappingMetadataExtracter#setMappingProperties(Properties)
* @see #setXpathMappingProperties(Properties)
* @since 2.1
* @author Derek Hulley (modified by Paul Mahoney)
*/
public class XPath2MetadataExtractor extends AbstractMappingMetadataExtracter implements NamespaceContext
{
    public static String[] SUPPORTED_MIMETYPES = new String[] {MimetypeMap.MIMETYPE_XML};
   
    private static Log logger = LogFactory.getLog(XPath2MetadataExtractor.class);
   
    private DocumentBuilder documentBuilder;
    private XPathFactory xpathFactory;
    private Map<String, String> namespacesByPrefix;
    private Map<String, XPathExpression> xpathExpressionMapping;

    /**
     * Default constructor
     */
    public XPath2MetadataExtractor()
    {
        super(new HashSet<String>(Arrays.asList(SUPPORTED_MIMETYPES)));
        try
        {
            documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            xpathFactory = new XPathFactoryImpl(); // Saxon9 implementation
        }
        catch (Throwable e)
        {
            throw new AlfrescoRuntimeException("Failed to initialize XML metadata extractor", e);
        }
    }

    /** {@inheritDoc} */
    public String getNamespaceURI(String prefix)
    {
        ParameterCheck.mandatoryString("prefix", prefix);
        String namespace = namespacesByPrefix.get(prefix);
        if (namespace == null)
        {
            throw new AlfrescoRuntimeException("Prefix '" + prefix + "' is not associated with a namespace.");
        }
        return namespace;
    }

    /** {@inheritDoc} */
    public String getPrefix(String namespaceURI)
    {
        ParameterCheck.mandatoryString("namespaceURI", namespaceURI);
        for (Map.Entry<String, String> entry : namespacesByPrefix.entrySet())
        {
            if (namespaceURI.equals(entry.getValue()))
            {
                return entry.getKey();
            }
        }
        return null;
    }

    /** {@inheritDoc} */
    public Iterator<String> getPrefixes(String namespaceURI)
    {
        ParameterCheck.mandatoryString("namespaceURI", namespaceURI);
        List<String> prefixes = new ArrayList<String>(2);
        for (Map.Entry<String, String> entry : namespacesByPrefix.entrySet())
        {
            if (namespaceURI.equals(entry.getValue()))
            {
                prefixes.add(entry.getKey());
            }
        }
        return prefixes.iterator();
    }

    /**
     * Set the properties file that maps document properties to the XPath statements
     * necessary to retrieve them.
     * <p>
     * The Xpath mapping is of the form:
     * <pre>
     * # Namespaces prefixes
     * namespace.prefix.my=http://www....com/alfresco/1.0
     *
     * # Mapping
     * editor=/my:example-element/@cm:editor
     * title=/my:example-element/text()
     * </pre>
     */
    public void setXpathMappingProperties(Properties xpathMappingProperties)
    {
        namespacesByPrefix = new HashMap<String, String>(7);
        xpathExpressionMapping = new HashMap<String, XPathExpression>(17);
        readXPathMappingProperties(xpathMappingProperties);
    }
   
    @Override
    protected void init()
    {
        PropertyCheck.mandatory(this, "xpathMappingProperties", xpathExpressionMapping);
        // Get the base class to set up its mappings
        super.init();
        // Remove all XPath expressions that aren't going to be used
        Map<String, Set<QName>> mapping = getMapping();
        Set<String> xpathExpressionMappingKeys = new HashSet<String>(xpathExpressionMapping.keySet());
        for (String xpathMappingKey : xpathExpressionMappingKeys)
        {
            if (!mapping.containsKey(xpathMappingKey))
            {
                xpathExpressionMapping.remove(xpathMappingKey);
            }
        }
    }

    /**
     * It is not possible to have any default mappings, but something has to be returned.
     *
     * @return          Returns an empty map
     */
    @Override
    protected Map<String, Set<QName>> getDefaultMapping()
    {
        return Collections.emptyMap();
    }

    @Override
    protected Map<String, Serializable> extractRaw(ContentReader reader) throws Throwable
    {
        InputStream is = null;
        try
        {
            is = reader.getContentInputStream();
            Document doc = documentBuilder.parse(is);
            Map<String, Serializable> rawProperties = processDocument(doc);
            if (logger.isDebugEnabled())
            {
                logger.debug("\n" +
                        "Extracted XML metadata: \n" +
                        "   Reader:  " + reader + "\n" +
                        "   Results: " + rawProperties);
            }
            return rawProperties;
        }
        finally
        {
            if (is != null)
            {
                try { is.close(); } catch (IOException e) {}
            }
        }
    }
   
    /**
     * Executes all the necessary XPath statements to extract values.
     */
    protected Map<String, Serializable> processDocument(Document document) throws Throwable
    {
        Map<String, Serializable> rawProperties = super.newRawMap();
       
        // Execute all the XPaths that we saved
        for (Map.Entry<String, XPathExpression> element : xpathExpressionMapping.entrySet())
        {
            String documentProperty = element.getKey();
            XPathExpression xpathExpression = element.getValue();
            // Get the value, assuming it is a nodeset
            Serializable value = null;
            try
            {
                value = getNodeSetValue(document, xpathExpression);
            }
            catch (XPathExpressionException e)
            {
                // That didn't work, so give it a try as a STRING
                value = getStringValue(document, xpathExpression);
            }
            // Put the value
            super.putRawValue(documentProperty, value, rawProperties);
        }
        // Done
        return rawProperties;
    }
   
    private Serializable getStringValue(Document document, XPathExpression xpathExpression) throws XPathExpressionException
    {
        String value = (String) xpathExpression.evaluate(document, XPathConstants.STRING);
        // Done
        return value;
    }
   
    private Serializable getNodeSetValue(Document document, XPathExpression xpathExpression) throws XPathExpressionException
    {
        // Execute it
        NodeList nodeList = null;
        try
        {
            nodeList = (NodeList) xpathExpression.evaluate(document, XPathConstants.NODESET);
        }
        catch (XPathExpressionException e)
        {
            // Expression didn't evaluate to a nodelist
            if (logger.isDebugEnabled())
            {
                logger.debug("Unable to evaluate expression and return a NODESET: " + xpathExpression);
            }
            throw e;
        }
        // Convert the value
        Serializable value = null;
        int nodeCount = nodeList.getLength();
        if (nodeCount == 0)
        {
            // No result
        }
        else if (nodeCount == 1)
        {
            Node node = nodeList.item(0);
            // Get the string value
            value = node.getTextContent();
        }
        else
        {
            // Make a collection of the values
            ArrayList<String> stringValues = new ArrayList<String>(5);
            for (int i = 0; i < nodeCount; i++)
            {
                stringValues.add(nodeList.item(i).getTextContent());
            }
            value = stringValues;
        }
        // Done
        return value;
    }
   
    /**
     * A utility method to convert mapping properties to the Map form.
     *
     * @see #setMappingProperties(Properties)
     */
    protected void readXPathMappingProperties(Properties xpathMappingProperties)
    {
        // Get the namespaces
        for (Map.Entry<Object,Object> entry : xpathMappingProperties.entrySet())
        {
            String propertyName = (String) entry.getKey();
            if (propertyName.startsWith("namespace.prefix."))
            {
                String prefix = propertyName.substring(17);
                String namespace = (String) entry.getValue();
                namespacesByPrefix.put(prefix, namespace);
            }
        }
        // Create the mapping
        for (Map.Entry<Object,Object> entry : xpathMappingProperties.entrySet())
        {
            String documentProperty = (String) entry.getKey();
            String xpathStr = (String) entry.getValue();
            if (documentProperty.startsWith(NAMESPACE_PROPERTY_PREFIX))
            {
                // Ignore these now
                continue;
            }
            // Construct the XPath
            XPath xpath = xpathFactory.newXPath();
            xpath.setNamespaceContext(this);
            XPathExpression xpathExpression = null;
            try
            {
                xpathExpression = xpath.compile(xpathStr);
            }
            catch (XPathExpressionException e)
            {
                throw new AlfrescoRuntimeException("\n" +
                        "Failed to create XPath expression: \n" +
                        "   Document property: " + documentProperty + "\n" +
                        "   XPath:             " + xpathStr + "\n" +
                        "   Error: " + e.getMessage(),
                        e);
            }
            // Persist it
            xpathExpressionMapping.put(documentProperty, xpathExpression);
            if (logger.isDebugEnabled())
            {
                logger.debug("Added mapping from " + documentProperty + " to " + xpathStr);
            }
        }
        // Done
    }
}