<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic [Alf 7 Community] Metadata not extracted from office documents on webdav upload in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/alf-7-community-metadata-not-extracted-from-office-documents-on/m-p/145009#M38465</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I'm running Alfresco 7 on a CentOS 7 VM. And I noticed since the update from Alfresco 6 to Alfresco 7, metadata extraction from office documents don't seem to work at all.&lt;/P&gt;&lt;P&gt;In Alfresco 6, on a webdav upload, I had logs like this:&lt;/P&gt;&lt;PRE&gt;Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Get supported:   extracter.TikaAuto
Get supported:   extracter.Poi
Get returning:   extracter.Poi
Starting metadata extraction: 
   extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@b772bfe
Concurrent extractions : 0
New extraction accepted. Concurrent extractions : 1&lt;/PRE&gt;&lt;P&gt;Now I have this :&lt;/P&gt;&lt;PRE&gt;Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Finding extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Find unsupported: extracter.RFC822
Find returning:   []
Get returning:   null
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Get returning:   null&lt;/PRE&gt;&lt;P&gt;All the classes are here in the lib folder added to the classpath :&lt;/P&gt;&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="index.png" style="width: 454px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://connect.hyland.com/t5/image/serverpage/image-id/1650i34AEF1A241C172E2/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;It's not the same classes that in Alfresco 6 (tika-parsers-alfresco-patched and tika-core-alfresco-patched) but I guess it's related to the upgrade to Tika 2.x.&lt;/P&gt;&lt;P&gt;I tried copying the jars in the tomcat folder to be sure it's not a classpath issue, I tried making &lt;SPAN&gt;a content-services-context.xml to override conf, just in case I added lines like extracter.TikaAuto.enabled = true in alfresco-global.properties, but so far didn't find the cause of the issue.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Does someone have any clue on this?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Raphaël.&lt;/P&gt;</description>
    <pubDate>Wed, 29 Mar 2023 08:41:14 GMT</pubDate>
    <dc:creator>badawiraphael</dc:creator>
    <dc:date>2023-03-29T08:41:14Z</dc:date>
    <item>
      <title>[Alf 7 Community] Metadata not extracted from office documents on webdav upload</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/alf-7-community-metadata-not-extracted-from-office-documents-on/m-p/145009#M38465</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I'm running Alfresco 7 on a CentOS 7 VM. And I noticed since the update from Alfresco 6 to Alfresco 7, metadata extraction from office documents don't seem to work at all.&lt;/P&gt;&lt;P&gt;In Alfresco 6, on a webdav upload, I had logs like this:&lt;/P&gt;&lt;PRE&gt;Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Get supported:   extracter.TikaAuto
Get supported:   extracter.Poi
Get returning:   extracter.Poi
Starting metadata extraction: 
   extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@b772bfe
Concurrent extractions : 0
New extraction accepted. Concurrent extractions : 1&lt;/PRE&gt;&lt;P&gt;Now I have this :&lt;/P&gt;&lt;PRE&gt;Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Finding extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Find unsupported: extracter.RFC822
Find returning:   []
Get returning:   null
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Get returning:   null&lt;/PRE&gt;&lt;P&gt;All the classes are here in the lib folder added to the classpath :&lt;/P&gt;&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="index.png" style="width: 454px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://connect.hyland.com/t5/image/serverpage/image-id/1650i34AEF1A241C172E2/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;It's not the same classes that in Alfresco 6 (tika-parsers-alfresco-patched and tika-core-alfresco-patched) but I guess it's related to the upgrade to Tika 2.x.&lt;/P&gt;&lt;P&gt;I tried copying the jars in the tomcat folder to be sure it's not a classpath issue, I tried making &lt;SPAN&gt;a content-services-context.xml to override conf, just in case I added lines like extracter.TikaAuto.enabled = true in alfresco-global.properties, but so far didn't find the cause of the issue.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Does someone have any clue on this?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Raphaël.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Mar 2023 08:41:14 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/alf-7-community-metadata-not-extracted-from-office-documents-on/m-p/145009#M38465</guid>
      <dc:creator>badawiraphael</dc:creator>
      <dc:date>2023-03-29T08:41:14Z</dc:date>
    </item>
  </channel>
</rss>

