I'm setting up a proof-of-concept simple clustered configuration (single datastore and database, multiple alfresco instances) to try to convince management to switch from a legacy system.
The documentation I can find in the wiki and on the forums and blogs, seem to talk about lucene a lot. However, for solr, there's mention about "no need to duplicate indexes on every machine in a cluster". So what is the best practise here:
1) only run one instance of solr, say on the 1st instance only, and not configure the /opt/alfresco-4.0d/tomcat/conf/Catalina/localhost/solr.xml file on the 2nd instance
2) run an instance of solr on each instance, i.e. split the solr indexes (alf_data/solr/…) off to their own local directories on each instance
I tried (1), but the 2nd instance would never show any documents in the list of "recently modified". I'm pretty sure it's because it couldn't connect to a local solr instance, as using the default documentation, solr.host=localhost is used (and obviously on instance 2, this isn't correct)
After some thinking, I decided that I should be running separate instances, since there's mention of "search support can be scaled separately from the Alfresco Repository (for example 2 SOLR master instances for a 4 cluster node)". To set this up, I basically configured the solr.xml file and changed moved the associated solr data.dir.root, dir.keystore, etc to each local instance's local disks.
Is this correct? Have I setup two SOLR master instances correctly as intended? Everything seems to be working, but I don't know if there's any side effects as I start populating 100s of sites and 1000's of documents, and running test cases.