Custom text extractors

gscheibel — Tue, 19 Jun 2007 13:16:15 GMT

Hi all,I need to write some custom text extractors for Lucene in Alfresco because I have some files in my company that Alfresco doesn't index.I already look at the config files and didn't find any tag to assign a new text extractor class.Is there a way to do this?Thanks in advance.

Re: Custom text extractors

kevinr — Tue, 19 Jun 2007 14:41:41 GMT

Alfresco already has a comprehensive framework for configuring and developing new transformation classes. Basically it's a matter of coding up a bean to a specific interface, then using Spring to config in the new bean with the correct configuration specifying that it is capable of transforming one mimetype to the text/plain mimetype.
http://wiki.alfresco.com/wiki/Content_Transformations#Development

There are several examples in the SDK including the PDFBox (PDF to text) transformer and the OpenOffice transformers. Once you have written and configured your transformer it will automatically be used by our lucene integration to convert a specific filetype to text format during the indexing process.

Hope this helps,

Kevin

topic Re: Custom text extractors in Alfresco Archive

Custom text extractors

Re: Custom text extractors