Hyland Connect

gscheibel · ‎06-19-2007

Hi all,

I need to write some custom text extractors for Lucene in Alfresco because I have some files in my company that Alfresco doesn't index.
I already look at the config files and didn't find any tag to assign a new text extractor class.

Is there a way to do this?

Thanks in advance.

kevinr · ‎06-19-2007

Alfresco already has a comprehensive framework for configuring and developing new transformation classes. Basically it's a matter of coding up a bean to a specific interface, then using Spring to config in the new bean with the correct configuration specifying that it is capable of transforming one mimetype to the text/plain mimetype.
http://wiki.alfresco.com/wiki/Content_Transformations#Development

There are several examples in the SDK including the PDFBox (PDF to text) transformer and the OpenOffice transformers. Once you have written and configured your transformer it will automatically be used by our lucene integration to convert a specific filetype to text format during the indexing process.

Hope this helps,

Kevin

Hyland Connect

Custom text extractors