[Alf 7 Community] Metadata not extracted from office documents on webdav upload

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-29-2023 04:41 AM
Hello,
I'm running Alfresco 7 on a CentOS 7 VM. And I noticed since the update from Alfresco 6 to Alfresco 7, metadata extraction from office documents don't seem to work at all.
In Alfresco 6, on a webdav upload, I had logs like this:
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document Get supported: extracter.TikaAuto Get supported: extracter.Poi Get returning: extracter.Poi Starting metadata extraction: extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@b772bfe Concurrent extractions : 0 New extraction accepted. Concurrent extractions : 1
Now I have this :
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document Finding extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document Find unsupported: extracter.RFC822 Find returning: [] Get returning: null Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document Get returning: null
All the classes are here in the lib folder added to the classpath :
It's not the same classes that in Alfresco 6 (tika-parsers-alfresco-patched and tika-core-alfresco-patched) but I guess it's related to the upgrade to Tika 2.x.
I tried copying the jars in the tomcat folder to be sure it's not a classpath issue, I tried making a content-services-context.xml to override conf, just in case I added lines like extracter.TikaAuto.enabled = true in alfresco-global.properties, but so far didn't find the cause of the issue.
Does someone have any clue on this?
Thanks,
Raphaël.
- Labels:
-
Alfresco Content Services
