cancel
Showing results for 
Search instead for 
Did you mean: 

General metadata extraction from MS Office (primarily MS Word)

Benedikt_Naesse
Confirmed Champ
Confirmed Champ

What are the steps a developer needs to go through to display custom metadata information from a MS Word document (the standard Nuxeo extraction is rather poor) ?

Let me give an example. Suppose most documents in our company have a custom MS Word property called "Document Description" and when I navigate to this document in my workspace, I would like to see the Document Description field in the "Metadata" part in the "Summary" tab page of the document.

I assume there are multiple steps to be taken here to achieve this behaviour ...

Would Studio help with this (automatic metadata extraction) or not ?

1 REPLY 1

Olivier_Grisel
Star Contributor
Star Contributor

Studio only won't be enough but could help define some new metadata fields in the Nuxeo document types to host your custom metadata and the matching layout to display (or manually edit them using from the web browser).

If you are a Java developer, you could write an Automation Operation in Java using the Nuxeo IDE that embeds the Apache Tika library for the extraction it-self and then plug it to a user action or a event listener to trigger the extraction whenever a document is modified.

Getting started

Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.