cancel
Showing results for 
Search instead for 
Did you mean: 

Query XML content over CMIS

michel_b
Champ on-the-rise
Champ on-the-rise

I am evaluating Nuxeo as a CMIS compliant repository for a large and important project in the Netherlands. One of our project goals is to have a repository with medical content. We have selected DITA (=an xml standard) as our basis for structuring the content, and we are tagging these docs with semantically linked keywords from a domain specific ontology. This is all done inside the XML content via a custom editor, so independent of the repository, which is vital for our architecture. The editor maintains the repository over CMIS. One of the requirements there is to be able to list all documents tagged with a certain ontology keyword. So in a nutshell, when I have repository document with a DITA xml file like this:

<?xml version="1.0"?>
<topic xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a="http://dita.oasis-open.org/architecture/2005/" id="dita_topic" xml:lang="en-us" xsi:noNamespaceSchemaLocation="urn:oasis:names:tc:dita:xsd:topic.xsd" xml:base="http://localhost:3000/documents/new_topic.xml">
   <title>My Title</title>
   <shortdesc>My description</shortdesc>
   <prolog>
      <metadata>
         <keywords>
            <keyword rel="http://dbpedia.org/resource/Paris">
               Paris
            </keyword>
            <keyword rel="http://dbpedia.org/resource/Rome">
               Rome
            </keyword>
         </keywords>
      </metadata>
   </prolog>
   <body>
     ...
   </body>
</topic>

I would like to do this:

curl -u un:pw "http://localhost:8080/nuxeo/atom/cmis/default/query?q=SELECT+cmis:objectId,+dc:title+
FROM+cmis:folder+WHERE
+my:keyworduri+=+'http://dbpedia.org/resource/Paris'&searchAllVersions=true"

and find my document.

My best guess is I need to extract the xml fields I want to query when creating/updating documents and set them as custom metadata. I thought this was a fairly common use case, but the information I have been able to find on metadata extraction is either outdated or pretty scarce. So can this be done in a fairly straightforward way (I am not a Java programmer) with Nuxeo? If so, how? Any other ways of satisfying my requirements?

TIA.

2 REPLIES 2

Florent_Guillau
World-Class Innovator
World-Class Innovator

This is fairly straightforward to do, the idea is to write a Java EventListener that reacts on the documentCreated and documentModified events, does the metadata extraction according to your logic (using some XPath processor for instance), and stores it in the resulting document as Nuxeo metadata so that it can be queried easily.

That's just a one- or two- page method and a few supporting XML files to register the listener as a new plugin.

You can also write practically everything in Nuxeo Studio with a few clicks, only the XPath-specific extraction logic will need be written in Groovy.

Getting started

Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.