cancel
Showing results for 
Search instead for 
Did you mean: 

Quality of Filters for MSOffice

jochen
Champ in-the-making
Champ in-the-making
Hello

Assuming that someone stores and indexes MS Office documents in Alfresco, I'd like to know how the quality of the index is. Some DMS are not really perfect in this respect.

Thanks for your help!

Regards,
Jochen
1 REPLY 1

kevinr
Star Contributor
Star Contributor
Text is extracted from MS Office documents using Open Office server. It successfully extracts text from Word, PowerPoint and Excel. PDFBox is used to extract text from PDF documents. Text is extracted from HTML documents using the built in HTML->text support in the Java Swing library.

So the "quality" of extraction is directly related to the quality of those 3rd party libraries and services.

Thanks,

Kevin
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.