We are doing some high level analysis to integrate Alfresco with a custom indexing engine. We want to push the documents to the custom indexing engine and get it indexed there.
Any thoughts on how this can be implemented? Since Alfresco plugged in Solr with 4.0 and has the ability to switch b/w Solr and Lucen, we are hoping that the architecture already can support a custom indexing engine.
In Alfresco search is implemented using subsystem, a subsystem is a configurable module responsible for a sub-part of Alfresco functionality. So technically we can extend alfresco search sub-system and add our own implementation,Now alfresco search sub-system include 3 implementations: noindex:classpath*:alfresco/extension/subsystems/Search/noindex/noindex/*-context.xml solr:classpath*:alfresco/extension/subsystems/Search/solr/solr/*-context.xml lucene:classpath*:alfresco/extension/subsystems/Search/lucene/lucene/*-context.xml
If you are looking to completely replace Lucene or Solr, I would strongly recommend you to reconsider. Alfresco is tightly integrated with both and they are used to provide more than just full text search services to Alfresco.
My suggestion is to leave those subsystems in-place and create your own system that can index content from Alfresco. Then customise Alfresco's user interface to use your custom index solution where appropriate.
Agree,If you just want to build an (integrated) data retrieval system you can just grab data from alfresco using api and push the documents to the your indexing engine.Then implement custom search interface with your indexing engine.