cancel
Showing results for 
Search instead for 
Did you mean: 

Link content to TIFF files with Lucene

fguillaume
Champ in-the-making
Champ in-the-making
Hello,

We need to import TIFF files into the repository. We would like to store these images in Alfresco and to be able to perform full-text searches on them, by telling the Lucene engine that the content of the TIFF file (found by doing OCR  in a previous step) is related to the image file.

Is it possible to achieve that, or has anyone a better solution for handling this?

Best Regards,

Fabien
1 REPLY 1

kevinr
Star Contributor
Star Contributor
Our full-text Lucene integration uses the to-text transformers in the repository to convert various mimetypes to text. So it would be a matter of writing a to-text transformer class and registering it for the "image/tiff" mimetype. It would then get called when a tiff image was added to the repo or the content modified/updated. You transformer class could perform any work required to get the OCR text (such as reading an association or custom property you have saved the data into) and return this as the result of the transformer.

There are various examples of this in the Alfresco SDK, such as the PdfBoxContentTransformer transformer class (org.alfresco.repo.content.transform.PdfBoxContentTransformer) which converts PDF to text for indexing.

http://wiki.alfresco.com/wiki/Content_Transformations

Hope this helps,

Kevin