<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: XML metadata extraction in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213443#M166573</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;The problem must lie with the &lt;/SPAN&gt;&lt;STRONG&gt;wcm-xml-metadata-extracter-context.xml&lt;/STRONG&gt;&lt;SPAN&gt; file, which is specifically tailored to work with WCM.&amp;nbsp; I'm guessing that the registry used there is not the correct one.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 23 Apr 2010 11:25:02 GMT</pubDate>
    <dc:creator>derek</dc:creator>
    <dc:date>2010-04-23T11:25:02Z</dc:date>
    <item>
      <title>XML metadata extraction</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213440#M166570</link>
      <description>I know this question gets asked a lot, and I've read the other threads, but without success. Most other people seemed to be trying to get XML metadata extraction within the web content management system. I don't know if my situation is significantly different.I would like to be able to import multip</description>
      <pubDate>Thu, 22 Apr 2010 14:20:38 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213440#M166570</guid>
      <dc:creator>swithun</dc:creator>
      <dc:date>2010-04-22T14:20:38Z</dc:date>
    </item>
    <item>
      <title>Re: XML metadata extraction</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213441#M166571</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;At first glance, your config looks correct.&amp;nbsp; Where have you placed your *-context.xml files?&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Do you get interesting stuff when you turn on debug?&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp; log4j.logger.org.alfresco.repo.content.metadata.xml=DEBUG&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 22 Apr 2010 17:39:13 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213441#M166571</guid>
      <dc:creator>derek</dc:creator>
      <dc:date>2010-04-22T17:39:13Z</dc:date>
    </item>
    <item>
      <title>Re: XML metadata extraction</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213442#M166572</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;My *-context.xml files are in &lt;/SPAN&gt;&lt;SPAN style="text-decoration: underline;"&gt;/opt/alfresco/tomcat/shared/classes/alfresco/extension/&lt;/SPAN&gt;&lt;SPAN&gt;, as is my model document. I think this is the right place for them.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;When I turn on debugging, this is all that I get on startup:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;10:15:06,981&amp;nbsp; DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from rpsDate to /TEI.2/teiHeader/fileDesc/publicationStmt/date/text()&lt;BR /&gt;10:15:06,982&amp;nbsp; DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from rpsSession to /TEI.2/teiHeader/fileDesc/editionStmt/edition[@n='session']/text()&lt;BR /&gt;10:15:06,982&amp;nbsp; DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from rpsReign to /TEI.2/teiHeader/fileDesc/titleStmt/title/text()&lt;BR /&gt;10:15:06,983&amp;nbsp; DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from rpsID to /TEI.2/@id&lt;BR /&gt;10:15:06,992&amp;nbsp; DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from version to /model/version/text()&lt;BR /&gt;10:15:06,992&amp;nbsp; DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from author to /model/author/text()&lt;BR /&gt;10:15:06,993&amp;nbsp; DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from description to /model/description/text()&lt;BR /&gt;10:15:06,993&amp;nbsp; DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from title to /model/@name&lt;BR /&gt;10:15:06,999&amp;nbsp; WARN&amp;nbsp; [springframework.beans.GenericTypeAwarePropertyDescriptor] Invalid JavaBean property 'overwritePolicy' being accessed! Ambiguous write methods found next to actually used [public void org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter.setOverwritePolicy(java.lang.String)]: [public void org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter.setOverwritePolicy(org.alfresco.repo.content.metadata.MetadataExtracter$OverwritePolicy)]&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;This indicates that my custom stuff is being picked up. There isn't any more output if I ingest an XML document (and add the aspect which defines the custom metadata fields and run the extract common metadata action). Could the WARNing message be relevant? It has always been there, even before I started trying to customise things. And commenting out the offending bean property makes no difference, apart from removing the log file entry.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Namespace issues are always complicated. I have a NS defined for my custom aspect, type and metadata fields. But the actual documents I want to ingest have no namespace defined, not even an empty one.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I have read posts saying that you need to comment out &lt;/SPAN&gt;&lt;EM&gt;metadataExtracterRegistry&lt;/EM&gt;&lt;SPAN&gt; property of the &lt;/SPAN&gt;&lt;EM&gt;avmMetadataExtracter&lt;/EM&gt;&lt;SPAN&gt; bean. I've tried this, with no effect.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks for the quick reply. I'm sure I'm very close.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Apr 2010 09:50:33 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213442#M166572</guid>
      <dc:creator>swithun</dc:creator>
      <dc:date>2010-04-23T09:50:33Z</dc:date>
    </item>
    <item>
      <title>Re: XML metadata extraction</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213443#M166573</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;The problem must lie with the &lt;/SPAN&gt;&lt;STRONG&gt;wcm-xml-metadata-extracter-context.xml&lt;/STRONG&gt;&lt;SPAN&gt; file, which is specifically tailored to work with WCM.&amp;nbsp; I'm guessing that the registry used there is not the correct one.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Apr 2010 11:25:02 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213443#M166573</guid>
      <dc:creator>derek</dc:creator>
      <dc:date>2010-04-23T11:25:02Z</dc:date>
    </item>
    <item>
      <title>Re: XML metadata extraction</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213444#M166574</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;In the end, I got what I wanted by writing a pair of Java classes. One extends &lt;/SPAN&gt;&lt;EM&gt;org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter&lt;/EM&gt;&lt;SPAN&gt;, and is registered as a bean in &lt;/SPAN&gt;&lt;EM&gt;content-services-context.xml&lt;/EM&gt;&lt;SPAN&gt;. The other extends &lt;/SPAN&gt;&lt;EM&gt;org.xml.sax.helpers.DefaultHandler&lt;/EM&gt;&lt;SPAN&gt;, and is called by the first to parse the XML files and return a HashMap of metadata properties which can then be put into &lt;/SPAN&gt;&lt;STRONG&gt;rawProperties&lt;/STRONG&gt;&lt;SPAN&gt; using &lt;/SPAN&gt;&lt;STRONG&gt;putRawValue&lt;/STRONG&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Probably not the most elegant way, but it works.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 12 May 2010 09:02:41 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213444#M166574</guid>
      <dc:creator>swithun</dc:creator>
      <dc:date>2010-05-12T09:02:41Z</dc:date>
    </item>
    <item>
      <title>Re: XML metadata extraction</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213445#M166575</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;@swithum&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It seems elegant to me!&amp;nbsp; The generalized XML metadata extraction is quite nasty; I imagine that extracting specific values based on your expected XML is much neater - and deterministic.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 12 May 2010 10:26:39 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/xml-metadata-extraction/m-p/213445#M166575</guid>
      <dc:creator>derek</dc:creator>
      <dc:date>2010-05-12T10:26:39Z</dc:date>
    </item>
  </channel>
</rss>

