parsing .doc word content?

targa2000 — Thu, 18 Feb 2010 13:06:08 GMT

I need to parse the content of a .doc word file for metadata after the file in added to the alfresco repository. What is the best way of doing this?

Re: parsing .doc word content?

openpj — Fri, 12 Mar 2010 11:29:45 GMT

You can implement your own extracter in Alfresco:
http://wiki.alfresco.com/wiki/Metadata_Extraction

Maybe it could be useful to parse the .doc word file using Apache POI:
http://poi.apache.org/

Alfresco includes Apache POI 3.1, so you can start to implement your extracter without adding any other libraries :wink:

Hope this helps.

topic Re: parsing .doc word content? in Alfresco Archive

parsing .doc word content?

Re: parsing .doc word content?