Could somebody explain how Solr indexing works in terms of a large queue of documents?
E.g. If Solr is stopped for some reason does it go through in sequence all new documents uploaded/created when it starts or does it go through modified documents? In your answer could you also reference how transactions that affect millions of nodes are handled in the event where indexing has to catch up.
I'm really looking for a diagram of how it works so I can get it clear in my mind.
By default on every 15 seconds SOLR tracks for changes on Alfresco side. The query for changes include any changes in content and newly created documents, changes on the content models and for changes on the ACLs for documents in order to index those changes on its cores.
SOLR updates its indexes by looking at the number of transactions that have been committed since it last talked to Alfresco.
@Sujay thanks for the info. Is it possible to identify and remove any large transactions from the queue before they are indexed or during indexing? i.e. transactions above a certain size should not be indexed. What affect would this have if the transaction were an ACL update, for example?