03-29-2023 04:41 AM
Hello,
I'm running Alfresco 7 on a CentOS 7 VM. And I noticed since the update from Alfresco 6 to Alfresco 7, metadata extraction from office documents don't seem to work at all.
In Alfresco 6, on a webdav upload, I had logs like this:
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document Get supported: extracter.TikaAuto Get supported: extracter.Poi Get returning: extracter.Poi Starting metadata extraction: extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@b772bfe Concurrent extractions : 0 New extraction accepted. Concurrent extractions : 1
Now I have this :
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document Finding extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document Find unsupported: extracter.RFC822 Find returning: [] Get returning: null Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document Get returning: null
All the classes are here in the lib folder added to the classpath :
It's not the same classes that in Alfresco 6 (tika-parsers-alfresco-patched and tika-core-alfresco-patched) but I guess it's related to the upgrade to Tika 2.x.
I tried copying the jars in the tomcat folder to be sure it's not a classpath issue, I tried making a content-services-context.xml to override conf, just in case I added lines like extracter.TikaAuto.enabled = true in alfresco-global.properties, but so far didn't find the cause of the issue.
Does someone have any clue on this?
Thanks,
Raphaël.
Explore our Alfresco products with the links below. Use labels to filter content by product module.