cancel
Showing results for 
Search instead for 
Did you mean: 

Searching inside XML document stored?

dish13
Champ in-the-making
Champ in-the-making
We have an application that stores some information in Alfresco. The information is store inside an XML. Is there any way to search inside stored XML?

Thanks.
2 REPLIES 2

mrogers
Star Contributor
Star Contributor
Not really.

There are a couple of XMLMetadata extractors (One for DM and one for AVM) that extract xml and store the content as properties so it gives some level of searchability.    This works fairly well for simple XML types,  but XML can be fairly rich.

From time to time we think it may be a good idea to integrate an XML database into Alfresco,  but at the moment that's seems to be an idea that is loosing favour.

michel_b
Champ on-the-rise
Champ on-the-rise
I would like to extract a few well defined fields and attributes from our XML content so I can search them via SOLR. I have been following the guides[1] and a older post[2] but I can't get it to work, mostly because I'm so easily confused by by Java and XML namespaces, but also because the information seems to be somewhat outdated. Finally, I have no real way of testing if anything works because I'm not sure of the right queries to do so.

Here's my setup:
We maintain an Alfresco Repository filled with DITA structured XML files (example below), maintained over CMIS by an external editor. To be able to search some of the info in the XML, I have added an extractor file (see below) and restarted. Initially there was an error on restart, but after I commented out the 'overwritePolicy' part I don't see anything related to 'metadata' in my startup logs*

Here's my extraction file, stored in tomcat/shared/classes/alfresco/extension/dita-extractor-context.xml. I have added my own namespace in there, but I'm not at all sure if that is the way to go.


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>

   <bean id="cmis_edit.DitaExtractor"
         class="org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter"
         parent="baseMetadataExtracter"
         init-method="init" >
      <property name="mappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.dita">http://example.com/dita</prop>
                  <prop key="title">dita:title</prop>
                  <prop key="description">dita:description</prop>
                  <prop key="keyword">dita:keyword</prop>
                  <prop key="keyworduri">dita:keyworduri</prop>
               </props>
            </property>
         </bean>
      </property>
      <property name="xpathMappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.dita">http://example.com/dita</prop>
                  <prop key="title">/*/title/text()</prop>
                  <prop key="description">/*/shortdesc/text()</prop>
                  <prop key="keyword">/*/prolog/metadata/keywords/keyword/text()</prop>
                  <prop key="keyworduri">/*/prolog/metadata/keywords/keyword/@rel</prop>
               </props>
            </property>
         </bean>
      </property>
   </bean>
  
   <bean
         id="cmis_edit.selector.XPathSelector"
         class="org.alfresco.repo.content.selector.XPathContentWorkerSelector"
         init-method="init">
      <property name="workers">
         <map>
            <entry key="/topic">
               <ref bean="cmis_edit.DitaExtractor" />
            </entry>
            <entry key="/reference">
               <ref bean="cmis_edit.DitaExtractor" />
            </entry>
            <entry key="/concept">
               <ref bean="cmis_edit.DitaExtractor" />
            </entry>      
         </map>
      </property>
   </bean>
  
   <bean
         id="cmis_edit.XMLMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XmlMetadataExtracter"
         parent="baseMetadataExtracter">
      <!– <property name="overwritePolicy">
         <value>EAGER</value>
      </property> –>
      <property name="selectors">
         <list>
            <ref bean="cmis_edit.selector.XPathSelector" />
         </list>
      </property>
   </bean>
  
</beans>


This is an example of the XML in the content stream:


<?xml version="1.0"?>
<topic xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a="http://dita.oasis-open.org/architecture/2005/" id="new_topic" xml:lang="en-us" xsi:noNamespaceSchemaLocation="urn:oasis:names:tc:dita:xsd:topic.xsd" xml:base="http://localhost:3000/documents/new_topic.xml">
   <title>
      My Title
   </title>
   <shortdesc>
      My description
   </shortdesc>
   <prolog>
      <metadata>
         <keywords>
            <keyword rel="http://dbpedia.org/resource/Paris">
               Paris
            </keyword>
            <keyword rel="http://dbpedia.org/resource/Rome">
               Rome
            </keyword>
         </keywords>
      </metadata>
   </prolog>
   <body>
      <section>
         <title>
            My Subtitle
         </title>
         <p>
            My paragraph
         </p>
      </section>
   </body>
</topic>


My questions are:
- does my extraction-context file contain any errors or omissions?
- do I need to make a model file? I didn't.
- how can I test if my content gets extracted and indexed?
- do I need to update anything else for this to work?
- is there a quicker way to test things besides restarting everything?

Hope you can help and TIA,
Michel

[1] http://wiki.alfresco.com/wiki/Metadata_Extraction
[2] https://forums.alfresco.com/en/viewtopic.php?t=7801
*)
I may be missing something in the log files, since I get frequent logging errors on startup, even though alfresco.log exists in my alfresco root with permissions 777. I can't tell where else it tries to create the file.

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: alfresco.log (Permission denied)