Hyland Connect

lnfalandino · ‎04-22-2008

Hello,

Edit: Also note that I'm using 2.9B.

First let me begin by saying that I'm still quite new to Alfresco and so my inexperience will probably show. I have - what I feel - is a fairly basic question but unfortunately the information I have found hasn't answered my question directly (though it has offered some clues) and I've become a bit confused as to what the best approach should be. Given that I have a very short timeframe to work this out, I'm hoping someone here can steer me in the right direction.

The simplest description of what I want to do is add custom metadata fields to content in an AVM store (WCM) that I can then search for using Lucene. From the searching I've done so far I am getting the impression that adding an aspect to my WCM content would be a good way to go. I'm not sure how to do this however, or how this fits with creating a custom form within WCM.

In my ideal world I imagine that I could go into my Web Project, select Add Content to my sandbox and for my form type I could pick my custom form which has additional metadata fields associated with it that could be entered/modified in the properties of that content item.

Huge thanks in advance to anyone who can give me a simple example of what the best way to accomplish this would be.

Thanks,
Naim

lnfalandino · ‎04-22-2008

Perhaps I'm thinking about this too hard.

Let's say I create a custom web form and add several additional field elements to it (e.g. authorPhoneNumber or dateApproved) that the user can then enter when creating content using that web form type. Will those pieces of information that are entered be valid query parameters when doing a luceneSearch() on the AVM store?

pmonks · ‎04-22-2008

Elements in XML files are not, by default, queryable in either the DM or AVM repositories (except via full text search, but to me that's something different to querying). To make XML elements queryable, Alfresco provides an "XML Matadata Extractor" (see http://wiki.alfresco.com/wiki/Metadata_Extraction#XML_Meta-data_Extractor_Configuration_for_WCM) that can be used to extract element values from XML files.

Once extracted, the aspect properties that those values get inserted into can be configured to be indexable, and hence available for querying via Lucene.

Cheers,
Peter

lnfalandino · ‎04-22-2008

Thank you for the post, I actually just discovered that about 30 minutes ago and have started to play with it.

I have a couple follow-up questions about XML Extractors:

1. What if you have several types of web forms, and you only care about extracting the metadata from a couple of them, and each has substantially different xsd's?

2. How do you configure the aspect properties to allow indexing over those props? The wiki article doesn't seem to indicate if that's done at the extractor level.

Thanks a lot,
Naim

kvc · ‎04-22-2008

Anything extracted from the XML and set as a property is automatically indexed and queryable via Lucene.

NOTE: Lucene indexing is triggered upon Submit (more correctly, upon the auto-snapshot taken upon a successful Submit) to Staging. Currently, only Staging repos are therefore indexed (we don't do delta indexes for user stores).

One additional thing to explore is a useful Community contribution that makes specifying elements to index much simpler and easier to specify in an XSD can be found here:

http://forge.alfresco.com/projects/wcm-metadata/

We are likely to move to a model more similar to this post-3.0.

Kevin

lnfalandino · ‎04-23-2008

Thanks Kevin.

I found that wcm-metadata extension yesterday morning but was unable to get it to install. (The MMT appeared to complain about a version number, I assumed it was because I'm using 2.9B.) It looks cool and like what I'd want, though I don't completely understand how I'd implement it.

I'm going to continue down the path with the extractor; If anyone can answer my first question about how you can configure the XPathSelector for different web forms please let me know.

Edit: I may actually play with that user extension a bit more. Would it be possible for me to just manually install the amp file by unzipping it or does the MMT installer do anything special that necessitates its use?

Thanks,
Naim

pmonks · ‎04-23-2008

XML Metadata Extraction happens in two steps:

Selection of a document, based on the root element (performed by one or more XPathSelectors)

Extraction of elements / attributes from that document (performed by one or more XPathMetadataExtractors)

The first step determines which XML files (whether produced via a Web Form or not) are eligible for metadata extraction, while the second step identifies which parts (elements/attributes) of the selected XML files to extract.

So to answer your question, you'd configure an XPathSelector that only selects the Web Form XML files that you wish to extract metadata from. The assumption here is that your Web Forms each produce XML files with a unique root element, but that's a good practice anyway.

Cheers,
Peter

lnfalandino · ‎04-23-2008

I think I understand that part. The bit I'm struggling with right now is I can't seem to get it to select my document for processing. For example, I have a web form that generates the following xml:


<mla:ml_article xmlns:alf="http://www.alfresco.org" xmlns:chiba="http://chiba.sourceforge.net/xforms" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:mla="http://www.alfresco.org/alfresco/mla" xmlns:xf="http://www.w3.org/2002/xforms" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <mla:title>MLA</mla:title> 
  <mla:author>Naim</mla:author> 
  <mla:description>MLA test</mla:description> 
  <mla:body>Body of content.</mla:body> 
  <mla:publish_date>2008-04-23</mla:publish_date> 
  <mla:language>English</mla:language> 
</mla:ml_article>
‍‍‍‍‍‍‍‍‍‍

My wcm-xml-metadata-extracter-context.xml file then has this entry under the XPathSelector bean:


…
            <entry key="/mla_article">
               <ref bean="extracter.xml.sample.AlfrescoModelMetadataExtracter" />
            </entry>
…
‍‍‍‍‍‍‍

At first i had "/mla:mla_article" but then at startup Alfresco complained about the mla namespace. Regardless, my model metadata extrater looks as such:


   <bean id="extracter.xml.sample.AlfrescoModelMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter"
         parent="baseMetadataExtracter"
         init-method="init" >
      <property name="mappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.cm">http://www.alfresco.org/model/content/1.0</prop>
                  <prop key="title">cm:title</prop>
                  <prop key="author">cm:author</prop>
                  <prop key="description">cm:description</prop>
                  <prop key="publish_date">cm:publish_date</prop>
                  <prop key="language">cm:language</prop>
               </props>
            </property>
         </bean>
      </property>
      <property name="xpathMappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.fm">http://www.alfresco.org/model/forum/1.0</prop>
                    <prop key="title">/mla_article/@name</prop>
                    <prop key="author">/mla_article/author/text()</prop>
                    <prop key="description">/mla_article/description/text()</prop>
                    <prop key="publish_date">/mla_article/publish_date/text()</prop>
                    <prop key="language">/mla_article/language/text()</prop>
               </props>
            </property>
         </bean>
      </property>
   </bean>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I feel as though I'm quite close, but something isn't quite in line, as my properties don't seem to be getting extracted (at least as far as I can tell from using the node browser).

Cheers everyone, thanks so much for helping me get over the hump.

Naim

lnfalandino · ‎04-23-2008

Well, I think my woes are definitely because of the extractor configuration. I realized that the namespace error I was getting when I had the "mla" XML namespace was because I had it also in the AlfrescoModelMetadataExtracter's xpath properties, e.g.


…
  <prop key="title">/mla:mla_article/@name</prop>
…
‍‍‍‍‍

and it definitely wasn't happy about that.

lnfalandino · ‎04-23-2008

I managed to get it working; As I thought I had some pathing issues. If anyone is curious about it let me know and I'll post the updated versions of my files.

Thanks again for the nudges in the right direction.

Hyland Connect

Adding custom metadata to WCM content