cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco Community 4.0d with Solr in cluster configuration

xarope
Champ in-the-making
Champ in-the-making
I'm setting up a proof-of-concept simple clustered configuration (single datastore and database, multiple alfresco instances) to try to convince management to switch from a legacy system.

The documentation I can find in the wiki and on the forums and blogs, seem to talk about lucene a lot.  However, for solr, there's mention about "no need to duplicate indexes on every machine in a cluster".  So what is the best practise here:

1) only run one instance of solr, say on the 1st instance only, and not configure the /opt/alfresco-4.0d/tomcat/conf/Catalina/localhost/solr.xml file on the 2nd instance

2) run an instance of solr on each instance, i.e. split the solr indexes (alf_data/solr/…) off to their own local directories on each instance

I tried (1), but the 2nd instance would never show any documents in the list of "recently modified".  I'm pretty sure it's because it couldn't connect to a local solr instance, as using the default documentation,  solr.host=localhost is used (and obviously on instance 2, this isn't correct)

After some thinking, I decided that I should be running separate instances, since there's mention of "search support can be scaled separately from the Alfresco Repository (for example 2 SOLR master instances for a 4 cluster node)".  To set this up, I basically configured the solr.xml file and changed moved the associated solr data.dir.root, dir.keystore, etc to each local instance's local disks.

Is this correct?  Have I setup two SOLR master instances correctly as intended?  Everything seems to be working, but I don't know if there's any side effects as I start populating 100s of sites and 1000's of documents, and running test cases.
8 REPLIES 8

mrogers
Star Contributor
Star Contributor
Each instance of alfresco needs to be able to access a search subsystem, however that does not need to be a local solar instance.   

You could have more than one instances of solr (for redundancy) behind a load-balance.     You can then scale the alfresco nodes differently to the solar nodes.    So you could take one solr node off line while maintaining it  or add more solr nodes if the load requires it.

However for evaluation I suggest that you stick to a single box solution (solr and the repo and share on one box).

xarope
Champ in-the-making
Champ in-the-making
Thanks, appreciate the feedback.  Seems to be working fine with an alfresco instance with it's own solr instance as well, and I'll try some horizontal scaling with an extra alfresco instance, using a pre-existing solr instance.

nubian
Champ in-the-making
Champ in-the-making
I hope I am posting this question in the correct place. I read the previous post and have been experimenting with using SOLR in a distributed fashion with Alfresco. Specifically, I have set up two Alfresco instances and changed the solr.xml file to point the data.dir to a NAS share. I also have changed alfresco-global properties to point the dir.root to a NAS share.  In my solrcore files, I have set "enable.alfresco.tracking=true" and in the repository file I have set solr.host=localhost. When I add content on either instance, I see the repository and the indexes update on the NAS, however I receive a NativeFileLocking error from Solr in my log files and the searches seem inconsistent. When I change enable.alfresco.tracking=false in one of the instances, the locking messages disappear but I do not see any updates to the document properties on the instance. What is the recommended approach to configure a two node Alfresco cluster with shared MySQL, shared indexes and a shared contentstore?

mrogers
Star Contributor
Star Contributor
All the advice I have seen is to keep your indexes on a local fast disk.  

I don't think you can't share your index files between two indexing servers (so two instances of solr accessing the same files)   What you can have is one or more indexing server, each accessing its own index.

nubian
Champ in-the-making
Champ in-the-making
Thanks for your prompt response. So if I understand correctly, the best approach is to leave the indexes on each Alfresco instance, share the MySQL database and contentstore and set enable.alfresco.tracking=true on both instances. Each Solr instance will track the shared contentstore without creating locking contention on the updates\writes when content changes or is added?

tommorris
Champ in-the-making
Champ in-the-making
Yes, that's certainly one approach. I'm not sure that it's the best approach however.

Yes, sharing the file-content binaries (contentstore) is a frequently used approach, via NAS (e.g. NFS). Faster shared file-system configurations are also possible, but this one is common.

Yes, an Alfresco cluster sharing a single instance of the MySQL DB is also very common, although there are dozens of ways that you can cluster/failover the DB configuration (all of which are DB brand dependent).

Having a local SOLR configuration on each Alfresco server is not so common, although not unworkable. I normally see Alfresco clusters sharing a single instance of a SOLR server that lives on it's on box.

I prefer to at least host the SOLR service in a webapp in another tomcat to avoid sharing the same JVM heap space. The churn of short-lived objects can lead to an increased amount of garbage collection that can potentially stifle the performance of core Alfresco services.

Another advantage to hosting SOLR on a separate box (assuming you're not fully virtualised) is the opportunity to place the indexes on a very fast local disk (e.g. 15k rpm or SSD). This is because Lucene (which SOLR is based upon) is IO heavy, and increased speeds due to better hardware or access reduced contention (from other processes or VMs) can improve overall search performance.

Incidentally, it's not impossible to get reasonable performance on a well specified SAN for the indexes…

Sometimes you 'll also see more than one remote SOLR instance polling one of the Alfresco servers, which is also acceptable but since this setup is not a true SOLR cluster, it does double the SQL-query workload imposed by indexing operations. With a load-balancer in front of these two SOLR servers, I think there is a potential for inconsistent search results between two consecutively issued search results (which can probably be managed through a load-balancer configuration of somekind).

Tom
http://www.ixxus.com
Alfresco Platinum Partner

deepak1987
Star Contributor
Star Contributor
Hi,

I have to configure SOLR Web App in an Alfresco Clustered Environment.

I have 2 Alfresco nodes in a cluster under Load Balancer.
Say,
Node1 with alfresco.host=node1.alfresco.com & alfresco.port=8080,
Node2 with alfresco.host=node2.alfresco.com & alfresco.port=8080 

Main URL to access from client browser is: api.alfresco.com:80

SOLR Web App is deployed on a separate tomcat instance.
Say, IP address as solr1.alfresco.com

Now, each Alfresco Node has following properties common for SOLR.

/var/lib/tomcat6/shared/classes/alfresco-global.properties file


### Solr indexing ###

index.subsystem.name=solr
dir.keystore=/var/alfsolr/keystore

solr.host=solr1.alfresco.com
solr.port=8080
solr.port.ssl=8443

And SOLR Web App is configured with following properties:
In
SOLR_HOME/workspace-SpacesStore/conf/solrcore.properties  &
SOLR_HOME/archive-SpacesStore/conf/solrcore.properties files


### Alfresco HostName & Port ###
alfresco.host=api.alfresco.com      ## Main URL
alfresco.port=80
alfresco.port.ssl=8443

## OR ##


alfresco.host=node1.alfresco.com      ## or node2.alfresco.com. One of Alfresco Node URL in a cluster.
alfresco.port=8080
alfresco.port.ssl=8443

If I specify alfresco.host=node1.alfresco.com & alfresco.port=8080, it works fine.


But, If I specify alfresco.host=api.alfresco.com & alfresco.port=80, it does not work and gives following Error:

I don't want to specify any of Alfresco Node's URL, because if that node goes down then, other nodes wont get search result.

java.net.ConnectException: Connection refused
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
   at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
   at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
   at java.net.Socket.connect(Socket.java:546)
   at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:584)
   at sun.security.ssl.SSLSocketImpl.<init>(SSLSocketImpl.java:426)
   at sun.security.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:142)
   at org.alfresco.encryption.ssl.AuthSSLProtocolSocketFactory.createSocket(AuthSSLProtocolSocketFactory.java:168)
   at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
   at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
   at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
   at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
   at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
   at org.alfresco.httpclient.AbstractHttpClient.executeMethod(AbstractHttpClient.java:110)
   at org.alfresco.httpclient.AbstractHttpClient.sendRemoteRequest(AbstractHttpClient.java:86)
   at org.alfresco.httpclient.HttpClientFactory$HttpsClient.sendRequest(HttpClientFactory.java:307)
   at org.alfresco.solr.client.SOLRAPIClient.getModelsDiff(SOLRAPIClient.java:1007)
   at org.alfresco.solr.tracker.CoreTracker.trackModels(CoreTracker.java:1630)
   at org.alfresco.solr.tracker.CoreTracker.trackRepository(CoreTracker.java:1134)
   at org.alfresco.solr.tracker.CoreTracker.updateIndex(CoreTracker.java:491)
   at org.alfresco.solr.tracker.CoreTrackerJob.execute(CoreTrackerJob.java:45)
   at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
   at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)
   
   
Please help on this.

reva_12
Champ in-the-making
Champ in-the-making
I am trying similar thing. Did you get any luck trying that?
I would like to know how to make 2 alfrescos use the same remote SOLR.

Please help if you have any idea.
Thanks.