[Alf 7 Community] Metadata not extracted from office documents on webdav upload

badawiraphael — Wed, 29 Mar 2023 08:41:14 GMT

Hello,

I'm running Alfresco 7 on a CentOS 7 VM. And I noticed since the update from Alfresco 6 to Alfresco 7, metadata extraction from office documents don't seem to work at all.

In Alfresco 6, on a webdav upload, I had logs like this:

Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Get supported:   extracter.TikaAuto
Get supported:   extracter.Poi
Get returning:   extracter.Poi
Starting metadata extraction: 
   extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@b772bfe
Concurrent extractions : 0
New extraction accepted. Concurrent extractions : 1

Now I have this :

Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Finding extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Find unsupported: extracter.RFC822
Find returning:   []
Get returning:   null
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Get returning:   null

All the classes are here in the lib folder added to the classpath :

It's not the same classes that in Alfresco 6 (tika-parsers-alfresco-patched and tika-core-alfresco-patched) but I guess it's related to the upgrade to Tika 2.x.

I tried copying the jars in the tomcat folder to be sure it's not a classpath issue, I tried making a content-services-context.xml to override conf, just in case I added lines like extracter.TikaAuto.enabled = true in alfresco-global.properties, but so far didn't find the cause of the issue.

Does someone have any clue on this?

Thanks,

Raphaël.

topic [Alf 7 Community] Metadata not extracted from office documents on webdav upload in Alfresco Forum

[Alf 7 Community] Metadata not extracted from office documents on webdav upload