cancel
Showing results for 
Search instead for 
Did you mean: 

XML Metadata Extractor configuration problem

rmills
Champ in-the-making
Champ in-the-making
Hi, I've run through the wiki article on this, but can't seem to get it to run.  I've imported a number of XMLs with the following DOM:

<Recipe>
    <Title>Value1</Title>
    <Author>Value2</Author>
    <Instruction>Value3</Instruction>
    <… />
</Recipe>

And I want to take the Title, Author and Instruction values and insert them into the Title, Author and Description fields on a basic Alfresco document.

I've configured my extractor with a selector to target just the Recipe elements and children:


<bean
         id="extracter.xml.sample.selector.XPathSelector"
         class="org.alfresco.repo.content.selector.XPathContentWorkerSelector"
         init-method="init">
      <property name="workers">
         <map>
            <entry key="/my:test">
               <null />
            </entry>
            <entry key="/Recipe">
               <ref bean="extracter.xml.sample.AlfrescoModelMetadataExtracter" />
            </entry>
         </map>
      </property>
   </bean>


And the mappings:

<bean id="extracter.xml.sample.AlfrescoModelMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter"
         parent="baseMetadataExtracter"
         init-method="init" >
      <property name="mappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                    <prop key="author">author</prop>
          <prop key="title">title</prop>
                <prop key="description">description</prop>
               </props>
            </property>
         </bean>-
      </property>
      <property name="xpathMappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="author">/Recipe/Author/text()</prop>
                  <prop key="title">/Recipe/Title/text()</prop>
                  <prop key="description">/Recipe/Instructions/text()</prop>
               </props>
            </property>
         </bean>
      </property>
   </bean>

I'm not sure what I'm missing here.  It seems pretty straight forward and I've worked off of the example that Alfresco provided, which is the exact functionality (except for the actual mappings and XPath queries), so it shouldn't be that different right?  It was kind of unclear in the wiki how to "activate" these extractors other than dropping them in /classes/alfresco/extension and removing the ".sample" from the end of the filename.  Thanks.  Any help is much appreciated.  Thanks.
6 REPLIES 6

derek
Star Contributor
Star Contributor
Hi,

I am assuming that you took the .xml.sample file, dropped the .sample part and put it into the alfresco/extensions location.  The mechanism by which this is picked up is discussed here: http://wiki.alfresco.com/wiki/Repository_Configuration

Now, your configuration looks correct except for the mappings such as
<prop key="author">author</prop>
You'll notice that the sample uses cm:author, but you chose to use just authorcm:author is equivalent (assuming you put in the missing namespace mapping as well) to {http://www.alfresco.org/model/content/1.0}author, while author is equivalent to {}author.  They are completely different properties.

Perform this query against your database, but you could use the Node Browser to dig down to the nodes as well:
select * from alf_node_properties where qname like '{}%';
and see whether there are any values there.

Regards

rmills
Champ in-the-making
Champ in-the-making
*To try to narrow down the problem and limit complicating factors, I've slimmed down the metadata extraction to focus on just the title field.  I figure if I can get one of them working first, the rest will fall into place.

Ok, I had followed the activation instructions like I was supposed to.  I just wanted to make sure I wasn't supposed to register the xml anywhere else (i.e. adding a line to another context xml). 

Querying for "{}%' returns nothing, but there is a '{http://www.alfresco.org/mondel/content/1.0}title'

Changing the namespaces back to the way they were in the sample file hasn't changed the output.  Title remains null.  Here's my latest configuration:



<bean id="extracter.xml.sample.AlfrescoModelMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter"
         parent="baseMetadataExtracter"
         init-method="init" >
      <property name="mappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.cm">http://www.alfresco.org/model/content/1.0</prop>
              <prop key="title">cm:title</prop>
               </props>
            </property>
         </bean>
      </property>
      <property name="xpathMappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.fm">http://www.alfresco.org/model/forum/1.0</prop>
                  <prop key="title">/Title/text()</prop>
               </props>
            </property>
         </bean>
      </property>
   </bean>

No combination of the namespaces seems to be working.

derek
Star Contributor
Star Contributor
How are you testing the extraction?
Where did you put the extension file?
Post the startup logs with the following on DEBUG: org.alfresco.repo.content.metadata

rmills
Champ in-the-making
Champ in-the-making
I actually put the extension in 2 places (one at a time, then together) because I was unsure about which one it should really go in:
-C:\Alfresco\tomcat\shared\classes\alfresco\extension
-C:\Alfresco\tomcat\webapps\alfresco\WEB-INF\classes\alfresco\extension

Here's the log from my most recent startup:

15:12:33,958 WARN  [org.springframework.remoting.rmi.RmiRegistryFactoryBean] Could not detect RMI registry - creating new one
15:12:39,877 WARN  [org.alfresco.util.OpenOfficeConnectionTester] A connection to OpenOffice could not be established.
15:12:39,897 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from subject to [{http://www.alfresco.org/model/content/1.0}description]
15:12:39,897 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from author to [{http://www.alfresco.org/model/content/1.0}author]
15:12:39,897 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from title to [{http://www.alfresco.org/model/content/1.0}title]
15:12:39,897 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from created to [{http://www.alfresco.org/model/content/1.0}created]
15:12:39,897 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Loaded mapping properties from resource: org/alfresco/repo/content/metadata/PdfBoxMetadataExtracter.properties
15:12:39,897 DEBUG [org.alfresco.repo.content.metadata.MetadataExtracterRegistry] Registering metadata extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@1219b8c
15:12:39,927 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from subject to [{http://www.alfresco.org/model/content/1.0}description]
15:12:39,927 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from author to [{http://www.alfresco.org/model/content/1.0}author]
15:12:39,927 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from lastSaveDateTime to [{http://www.alfresco.org/model/content/1.0}modified]
15:12:39,927 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from createDateTime to [{http://www.alfresco.org/model/content/1.0}created]
15:12:39,927 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from title to [{http://www.alfresco.org/model/content/1.0}title]
15:12:39,927 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Loaded mapping properties from resource: org/alfresco/repo/content/metadata/OfficeMetadataExtracter.properties
15:12:39,927 DEBUG [org.alfresco.repo.content.metadata.MetadataExtracterRegistry] Registering metadata extracter: org.alfresco.repo.content.metadata.OfficeMetadataExtracter@939d40
15:12:39,937 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from addressee to [{http://www.alfresco.org/model/content/1.0}addressee]
15:12:39,937 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from addressees to [{http://www.alfresco.org/model/content/1.0}addressees]
15:12:39,937 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from sentDate to [{http://www.alfresco.org/model/content/1.0}sentdate]
15:12:39,937 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from originator to [{http://www.alfresco.org/model/content/1.0}originator, {http://www.alfresco.org/model/content/1.0}author]
15:12:39,937 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from subjectLine to [{http://www.alfresco.org/model/content/1.0}description, {http://www.alfresco.org/model/content/1.0}subjectline]
15:12:39,937 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Loaded mapping properties from resource: org/alfresco/repo/content/metadata/MailMetadataExtracter.properties
15:12:39,937 DEBUG [org.alfresco.repo.content.metadata.MetadataExtracterRegistry] Registering metadata extracter: org.alfresco.repo.content.metadata.MailMetadataExtracter@14cee08
15:12:39,977 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from author to [{http://www.alfresco.org/model/content/1.0}author]
15:12:39,977 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from description to [{http://www.alfresco.org/model/content/1.0}description]
15:12:39,977 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from title to [{http://www.alfresco.org/model/content/1.0}title]
15:12:39,977 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Loaded mapping properties from resource: org/alfresco/repo/content/metadata/HtmlMetadataExtracter.properties
15:12:39,977 DEBUG [org.alfresco.repo.content.metadata.MetadataExtracterRegistry] Registering metadata extracter: org.alfresco.repo.content.metadata.HtmlMetadataExtracter@1be91c8
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from composer to [{mu_sic}composer]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from trackNumber to [{mu_sic}trackNumber]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from lyrics to [{mu_sic}lyrics]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from artist to [{http://www.alfresco.org/model/content/1.0}author, {mu_sic}artist]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from albumTitle to [{mu_sic}albumTitle]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from yearReleased to [{mu_sic}yearReleased]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from comment to [{mu_sic}comment]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from genre to [{mu_sic}genre]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from songTitle to [{mu_sic}songTitle, {http://www.alfresco.org/model/content/1.0}title]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from description to [{http://www.alfresco.org/model/content/1.0}description]
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Loaded mapping properties from resource: org/alfresco/repo/content/metadata/MP_3MetadataExtracter.properties
15:12:39,997 DEBUG [org.alfresco.repo.content.metadata.MetadataExtracterRegistry] Registering metadata extracter: org.alfresco.repo.content.metadata.MP_3MetadataExtracter@a594e1
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from user1 to [{http://www.alfresco.org/model/content/1.0}description]
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from creator to [{http://www.alfresco.org/model/content/1.0}author]
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from date to []
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from printDate to []
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from language to []
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from creationDate to [{http://www.alfresco.org/model/content/1.0}created]
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from keyword to []
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from generator to []
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from subject to [{http://www.alfresco.org/model/content/1.0}description]
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from initialCreator to []
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from title to [{http://www.alfresco.org/model/content/1.0}title]
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from printedBy to []
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from description to []
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Loaded mapping properties from resource: org/alfresco/repo/content/metadata/OpenDocumentMetadataExtracter.properties
15:12:40,027 DEBUG [org.alfresco.repo.content.metadata.MetadataExtracterRegistry] Registering metadata extracter: org.alfresco.repo.content.metadata.OpenDocumentMetadataExtracter@5866c1
15:12:40,037 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from author to [{http://www.alfresco.org/model/content/1.0}author]
15:12:40,037 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from description to [{http://www.alfresco.org/model/content/1.0}description]
15:12:40,037 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from title to [{http://www.alfresco.org/model/content/1.0}title]
15:12:40,037 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Loaded mapping properties from resource: org/alfresco/repo/content/metadata/OpenOfficeMetadataExtracter.properties
15:12:40,988 WARN  [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] connection failed: socket,host=localhost,port=8100,tcpNoDelay=1: java.net.ConnectException: Connection refused: connect
15:12:45,525 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] Added mapping from title to [{http://www.alfresco.org/model/content/1.0}title]
15:12:45,675 DEBUG [org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter] Added mapping from title to com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl@11c537f
15:12:45,695 DEBUG [org.alfresco.repo.content.metadata.MetadataExtracterRegistry] Registering metadata extracter: org.alfresco.repo.content.metadata.xml.XmlMetadataExtracter@1dc818d
15:12:47,097 INFO  [org.alfresco.repo.domain.schema.SchemaBootstrap] Schema managed by database dialect org.hibernate.dialect.HSQLDialect.
15:12:47,117 INFO  [org.alfresco.repo.domain.schema.SchemaBootstrap] Alfresco is using the HSQL default database. Please only use this while evaluating Alfresco, it is NOT recommended for production or deployment!
15:12:49,721 INFO  [org.alfresco.repo.domain.schema.SchemaBootstrap] No changes were made to the schema.
15:12:52,825 WARN  [org.alfresco.repo.admin.ConfigurationChecker] The Alfresco 'dir.root' property is set to a relative path './alf_data'.  'dir.root' should be overridden to point to a specific folder.
15:12:52,825 INFO  [org.alfresco.repo.admin.ConfigurationChecker] The Alfresco root data directory ('dir.root') is: .\alf_data
15:12:54,278 INFO  [org.alfresco.repo.admin.patch.PatchExecuter] Checking for patches to apply …
15:12:54,428 INFO  [org.alfresco.repo.module.ModuleServiceImpl] Found 0 module(s).
15:12:55,019 INFO  [org.alfresco.service.descriptor.DescriptorService] Alfresco JVM - v1.5.0_09-b01; maximum heap size 506.313MB
15:12:55,019 INFO  [org.alfresco.service.descriptor.DescriptorService] Alfresco started (Community Network): Current version 2.1.0 (482) schema 64 - Installed version 2.1.0 (482) schema 64
*There were a couple words that your forum's spam filter picked up (MP_3 and mu_sic).  I put an underscore to beat the filter.


And to test, every time I change something in the configuration, I go to my test space that contains a few existing documents, clear out any old rules, create a new Inbound metadata extraction rule, reapply the rule to the space, then Add content and upload a new xml of the same structure.

derek
Star Contributor
Star Contributor
From the Wiki:
The first thing to decide is if the set of registered extractors for WCM must be the same as the that available to the Alfresco Document Management framework. In our sample, this has not been done;
To test the sample, add one of the Alfresco model files to a web project.
So, the extractor is not available outside of WCM.  To make it available outside of WCM, remove the metadataExtracterRegistry property.  The extractor will use the registry defined on the base bean, i.e. the metadataExtracterRegistry and will be available for use in the normal document management side of things.

And C:\Alfresco\tomcat\shared\classes\alfresco\extension is the correct place.

Regards

rmills
Champ in-the-making
Champ in-the-making
Just wanted to close this issue out.  Derek's info has come in helpful in trouble shooting, but the root problem seemed to be in the 2.1.0 beta 1 release. By updating to the 2.1.0 beta 2 release, all the problems have been solved.