cancel
Showing results for 
Search instead for 
Did you mean: 

Extracting XML meta data into aspects

samuel_penn
Champ in-the-making
Champ in-the-making
Hi,

I'm currently trying to get my head around the XML Meta Data extractor as described at http://wiki.alfresco.com/wiki/Metadata_Extraction.

Something that isn't made clear, is what happens if the extractor is setup to write the data into a property field which only exists in an aspect? Is the aspect automatically added to the document, or does it have to already exist on the document? I'd like to define an aspect which describes some of the form fields in the WCM form data, and have that aspect automatically added by the extractor as required. I can't see any mention of whether something needs to be set up to get it to happen, or whether it's impossible.

Having setup an extractor in wcm-xml-metadata-extracter-context.xml, and switching on debug for metadata, I can see that my configuration is being picked up when the server starts:


16:27:55,613 DEBUG [content.metadata.AbstractMappingMetadataExtracter] Added mapping from atoz to [{http://www.centrom.com/alfresco/localgov/model}atoz]
16:27:55,629 DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from atoz to /art:article/art:header/art:atoz/text()

However, when I save a suitable web form in WCM, I see the following in the logs:


16:29:18,083 DEBUG [content.metadata.MetadataExtracterRegistry] Finding extractors for text/xml
16:29:18,130 DEBUG [metadata.xml.XPathMetadataExtracter]
No working metadata extractor could be found:
   Document: ContentAccessor[ contentUrl=store://2008/9/29/16/29/cf7eb2e7-e0e5-4cca-972f-655a78f91e98.bin, mimetype=text/xml, size=760, encoding=UTF-8, locale=en_US]
16:29:18,130 DEBUG [metadata.xml.XPathMetadataExtracter]
XML metadata extractor redirected:
   Reader:    ContentAccessor[ contentUrl=store://2008/9/29/16/29/cf7eb2e7-e0e5-4cca-972f-655a78f91e98.bin, mimetype=text/xml, size=760, encoding=UTF-8, locale=en_US]
   Extracter: null
   Metadata: {{http://www.alfresco.org/model/content/1.0}name=metatest.xml, {http://www.alfresco.org
/model/system/1.0}node-dbid=19105, {http://www.alfresco.org/model/system/1.0}store-identifier=hertsm
ere–admin–preview, {http://www.alfresco.org/model/wcmappmodel/1.0}orginalparentpath=hertsmere--adm
in–preview:/www/avm_webapps/ROOT, {http://www.alfresco.org/model/content/1.0}content=contentUrl=sto
re://2008/9/29/16/29/cf7eb2e7-e0e5-4cca-972f-655a78f91e98.bin|mimetype=text/xml|size=760|encoding=UT
F-8|locale=en_US_, {http://www.alfresco.org/model/content/1.0}owner=admin, {http://www.alfresco.org/
model/content/1.0}title={en_US=metatest.xml}, {http://www.alfresco.org/model/content/1.0}modified=Mo
n Sep 29 16:29:17 BST 2008, {http://www.alfresco.org/model/system/1.0}node-uuid=UNKNOWN, {http://www
.alfresco.org/model/wcmappmodel/1.0}parentformname=web-article, {http://www.alfresco.org/model/conte
nt/1.0}created=Mon Sep 29 16:29:17 BST 2008, {http://www.alfresco.org/model/system/1.0}store-protoco
l=avm, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/cont
ent/1.0}modifier=admin, {http://www.alfresco.org/model/wcmappmodel/1.0}renditions=[/www/avm_webapps/
ROOT/metatest.jsp]}

The 'no working metadata extractor could be found' suggests that it's not actually finding the extractor. I also had the impression that the extraction only happened when the form content was published to the staging sandbox - this debug is appearing when I save the form content in the user's sandbox, and I get no metadata debug at all when the form content is pushed to staging.

Looking at any version of the metadata.xml file in the node browser shows that no aspect has been added, and no metadata has been added.

The meta data extraction config I'm using is below - could anyone tell me if it looks sensible?


   <bean id="extracter.xml.centrom.ArticleModelMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter"
         parent="baseMetadataExtracter"
         init-method="init" >
      <property name="mappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.lg">http://www.centrom.com/alfresco/localgov/model</prop>
                  <prop key="atoz">lg:atoz</prop>
               </props>
            </property>
         </bean>
      </property>
     
      <property name="xpathMappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.art">http://www.centrom.com/localgov/wcm/article</prop>
                  <prop key="atoz">/art:article/art:header/art:atoz/text()</prop>
               </props>
            </property>
         </bean>
      </property>
   </bean>
  
  
   <!–
      This selector examines the XML documents, executing the given XPath statements until a
      match is made.
   –>
   <bean id="extracter.xml.centrom.selector.XPathSelector"
         class="org.alfresco.repo.content.selector.XPathContentWorkerSelector"
         init-method="init">
      <property name="workers">
         <map>
            <entry key="/art:article">
               <ref bean="extracter.xml.centrom.ArticleModelMetadataExtracter" />
            </entry>
         </map>
      </property>
   </bean>
  
   <bean id="extracter.xml.centrom.XMLMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XmlMetadataExtracter"
         parent="baseMetadataExtracter">

      <property name="registry">
         <ref bean="avmMetadataExtracterRegistry" />
      </property>

      <property name="overwritePolicy">
         <value>EAGER</value>
      </property>
      <property name="selectors">
         <list>
            <ref bean="extracter.xml.centrom.selector.XPathSelector" />
         </list>
      </property>
   </bean>

My aspect is defined as follows:


   <namespaces>
      <namespace uri="http://www.centrom.com/alfresco/localgov/model" prefix="lg"/>
   </namespaces>
  
    <aspects>
        <aspect name="lg:article">
            <title>Article Aspect</title>
            <properties>
                <property name="lg:atoz">
                    <type>d:text</type>
                </property>
            </properties>
        </aspect>
    </aspects>


Thanks,
Sam.
17 REPLIES 17

pmonks
Star Contributor
Star Contributor
What specifically doesn't work about datetimes?  Are you using the "xs:datetime" data type in your XML Schema?

The list of supported XML Schema structures and data types is at http://wiki.alfresco.com/wiki/Forms_Authoring_Guide#Overview_of_supported_XML_Schema_structures_and_....

Cheers,
Peter

samuel_penn
Champ in-the-making
Champ in-the-making
What specifically doesn't work about datetimes?  Are you using the "xs:datetime" data type in your XML Schema?

Using a datetime causes an exception to be thrown when opening a form. If I use:


<xs:element name="expire" type="xs:datetime"/>

Then I get the following error when creating a form:


org.alfresco.web.forms.FormProcessor$ProcessingException: org.alfresco.web.forms.xforms.FormBuilderException: error parsing schema: at line 27 column 59: src-resolve.4.2: Error resolving component 'xs:datetime'. It was detected that 'xs:datetime' is in namespace 'http://www.w3.org/2001/XMLSchema', but components from this namespace are not referenceable from schema document 'null'. If this is the incorrect namespace, perhaps the prefix of 'xs:datetime' needs to be changed. If this is the correct namespace, then an appropriate 'import' tag should be added to 'null'.

The xs namespace is being used by the other elements, and they work fine.

That page you reference is why I assumed it was not supported, since under "Supported XML Schema Data Types in Alfresco 2.0 and higher" it is listed as unsupported. It is listed as supported earlier on the page, but since it's never worked for me, I always assumed the unsupported listing was the correct one.

Sam.

pmonks
Star Contributor
Star Contributor
hmm…….strange!  I just noticed that xs:datetime is listed as supported in the top table, and not supported in the bottom one.

Could I ask a big favour and have you raise a ticket for the lack of xs:datetime support in JIRA (http://issues.alfresco.com/), including the full exception stack trace from alfresco.log, then post the number back here?

Cheers,
Peter

pmonks
Star Contributor
Star Contributor
Also, the enhancement request for the XSD driven indexing configuration is https://issues.alfresco.com/jira/browse/DW-5 - it'd be great if you could and/or comment on this if you feel it's a good idea.

Cheers,
Peter

samuel_penn
Champ in-the-making
Champ in-the-making
hmm…….strange!  I just noticed that xs:datetime is listed as supported in the top table, and not supported in the bottom one.

Could I ask a big favour and have you raise a ticket for the lack of xs:datetime support in JIRA (http://issues.alfresco.com/), including the full exception stack trace from alfresco.log, then post the number back here?

Actually, the problem is in the wiki page (which I tried to edit, but was defeated by the captchas). The clue was in another JIRA, which mentioned xs:dateTime not working in 2.9B. Sticking in a capital 'T', and it works (at least in 2.2.1). Given that other elements are listed with the correct case, someone who has better luck with captchas than me should probably fix the wiki.

Sam.

[1] https://issues.alfresco.com/jira/browse/ALFCOM-1798

pmonks
Star Contributor
Star Contributor
Ah, so it's a typo in the wiki - I'll fix that now.

Are you seeing the issue described in https://issues.alfresco.com/jira/browse/ALFCOM-1798 on 2.2SP1?

Cheers,
Peter

samuel_penn
Champ in-the-making
Champ in-the-making
Ah, so it's a typo in the wiki - I'll fix that now.
Are you seeing the issue described in https://issues.alfresco.com/jira/browse/ALFCOM-1798 on 2.2SP1?

No. Update and editing of the form works just fine. However, there does seem to be an issue with metadata extraction of dates. I am using the following configuration:


      <property name="supportedDateFormats">
          <list>
              <value>yyyy-MM-dd'T'HH:mm:ss</value>
              <value>yyyy-MM-dd</value>
          </list>
      </property>

I have a xs:dateTime field which is being converted to a datetime property, and a xs:date field which is being converted to a date property. According to the metadata extraction page on the wiki, the above date formats should be tried in order until one succeeds. Trying the form with just one field, and just one date format, works fine.

However, it seems that regardless of the order of the two formats above, if both formats are listed then the xs:dateTime gets converted as a date - i.e. the time is always extracted as 0:00:00 as if the shorter format is always being matched. It seems I need both in there to be able to extract both fields, otherwise I get a date format error.

Sam.

mcrocker
Champ in-the-making
Champ in-the-making
Hi im most definitely having the same issue.

I am trying to get your work around of using the xs:time to capture the time, but i havent been able to get it to display in any way other than publishTime:   1 January 1970 17:17…

i added the following code to the wcm-xml-metadata-extracter-context.xml but it seems that it doesn't get applied at all. <value>hh:mm:ss</value>

any advice ?

thanks,
Mat Crocker