Query XML content over CMIS

michel_b — Mon, 02 Apr 2012 15:00:13 GMT

I am evaluating Nuxeo as a CMIS compliant repository for a large and important project in the Netherlands. One of our project goals is to have a repository with medical content. We have selected DITA (=an xml standard) as our basis for structuring the content, and we are tagging these docs with semantically linked keywords from a domain specific ontology. This is all done inside the XML content via a custom editor, so independent of the repository, which is vital for our architecture. The editor maintains the repository over CMIS. One of the requirements there is to be able to list all documents tagged with a certain ontology keyword. So in a nutshell, when I have repository document with a DITA xml file like this:

<?xml version="1.0"?>
<topic xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a="http://dita.oasis-open.org/architecture/2005/" id="dita_topic" xml:lang="en-us" xsi:noNamespaceSchemaLocation="urn:oasis:names:tc:dita:xsd:topic.xsd" xml:base="http://localhost:3000/documents/new_topic.xml">
   <title>My Title</title>
   <shortdesc>My description</shortdesc>
   <prolog>
      <metadata>
         <keywords>
            <keyword rel="http://dbpedia.org/resource/Paris">
               Paris
            </keyword>
            <keyword rel="http://dbpedia.org/resource/Rome">
               Rome
            </keyword>
         </keywords>
      </metadata>
   </prolog>
   <body>
     ...
   </body>
</topic>

I would like to do this:

curl -u un:pw "http://localhost:8080/nuxeo/atom/cmis/default/query?q=SELECT+cmis:objectId,+dc:title+
FROM+cmis:folder+WHERE
+my:keyworduri+=+'http://dbpedia.org/resource/Paris'&searchAllVersions=true"

and find my document.

My best guess is I need to extract the xml fields I want to query when creating/updating documents and set them as custom metadata. I thought this was a fairly common use case, but the information I have been able to find on metadata extraction is either outdated or pretty scarce. So can this be done in a fairly straightforward way (I am not a Java programmer) with Nuxeo? If so, how? Any other ways of satisfying my requirements?

TIA.

Re: Query XML content over CMIS

Florent_Guillau — Wed, 11 Apr 2012 16:17:25 GMT

This is fairly straightforward to do, the idea is to write a Java EventListener that reacts on the documentCreated and documentModified events, does the metadata extraction according to your logic (using some XPath processor for instance), and stores it in the resulting document as Nuxeo metadata so that it can be queried easily.

That's just a one- or two- page method and a few supporting XML files to register the listener as a new plugin.

Re: Query XML content over CMIS

Florent_Guillau — Wed, 11 Apr 2012 16:19:54 GMT

You can also write practically everything in Nuxeo Studio with a few clicks, only the XPath-specific extraction logic will need be written in Groovy.

topic Re: Query XML content over CMIS in Nuxeo Forum

Query XML content over CMIS

Re: Query XML content over CMIS

Re: Query XML content over CMIS