Quality of Filters for MSOffice

jochen — Mon, 14 Aug 2006 20:08:06 GMT

HelloAssuming that someone stores and indexes MS Office documents in Alfresco, I'd like to know how the quality of the index is. Some DMS are not really perfect in this respect. Thanks for your help!Regards,Jochen

Re: Quality of Filters for MSOffice

kevinr — Tue, 15 Aug 2006 08:52:14 GMT

Text is extracted from MS Office documents using Open Office server. It successfully extracts text from Word, PowerPoint and Excel. PDFBox is used to extract text from PDF documents. Text is extracted from HTML documents using the built in HTML->text support in the Java Swing library.

So the "quality" of extraction is directly related to the quality of those 3rd party libraries and services.

Thanks,

Kevin

topic Re: Quality of Filters for MSOffice in Alfresco Archive

Quality of Filters for MSOffice

Re: Quality of Filters for MSOffice