cancel
Showing results for 
Search instead for 
Did you mean: 

Custom text extractors

gscheibel
Champ in-the-making
Champ in-the-making
Hi all,

I need to write some custom text extractors for Lucene in Alfresco because I have some files in my company that Alfresco doesn't index.
I already look at the config files and didn't find any tag to assign a new text extractor class.

Is there a way to do this?

Thanks in advance.
1 REPLY 1

kevinr
Star Contributor
Star Contributor
Alfresco already has a comprehensive framework for configuring and developing new transformation classes. Basically it's a matter of coding up a bean to a specific interface, then using Spring to config in the new bean with the correct configuration specifying that it is capable of transforming one mimetype to the text/plain mimetype.
http://wiki.alfresco.com/wiki/Content_Transformations#Development

There are several examples in the SDK including the PDFBox (PDF to text) transformer and the OpenOffice transformers. Once you have written and configured your transformer it will automatically be used by our lucene integration to convert a specific filetype to text format during the indexing process.

Hope this helps,

Kevin