cancel
Showing results for 
Search instead for 
Did you mean: 

parsing .doc word content?

targa2000
Champ in-the-making
Champ in-the-making
I need to parse the content of a .doc word file for metadata after the file in added to the alfresco repository.  What is the best way of doing this?
1 REPLY 1

openpj
Elite Collaborator
Elite Collaborator
You can implement your own extracter in Alfresco:
http://wiki.alfresco.com/wiki/Metadata_Extraction

Maybe it could be useful to parse the .doc word file using Apache POI:
http://poi.apache.org/

Alfresco includes Apache POI 3.1, so you can start to implement your extracter without adding any other libraries  :wink:

Hope this helps.