In a previous post I showed how to use an Apache reverse-proxy with a Hazelcast enabled Alfresco Share cluster to load-balance between multiple Share instances in Tomcat. Since moving to Linux as a development platform I thought I would revisit the set-up using the latest version of Apache and also add transparent failover to avoid interruption for users when a node goes down.
At least two instances of Share are needed - with config modified in tomcat/server.xml so they are using different ports and AJP route names. For each server enable the AJP Connector and Engine:
<!-- Define an AJP 1.3 Connector -->
<Connector port='8010' protocol='AJP/1.3' redirectPort='8444'
connectionTimeout='20000' URIEncoding='UTF-8' />
<!-- You should set jvmRoute to support load-balancing via AJP ie -->
<Engine name='Catalina' defaultHost='localhost' jvmRoute='tomcat1'>
See my earlier blog post for more details on doing this, but it's really just a case of duplicating a working Tomcat+Share instance and changing the port numbers. Of course you can use instances on separate machines or VMs to avoid some of the port twiddling. On each node also enable Hazelcast Share clustering via tomcat/shared/classes/web-extension/custom-slingshot-application-context.xml as per the previous post again as this also hasn't changed since Alfresco 4.0. There is an example custom-slingshot-application-context.xml.sample provided in the Alfresco distribution which includes this config.
Now install Apache 2.4 - for Ubuntu I used:
sudo apt-get install apache2
This ends up in /etc/apache2. Enable the various Apache2 modules we need to use the reverse proxy via AJP:
sudo a2enmod proxy_balancer
sudo a2enmod proxy_ajp
sudo a2enmod lbmethod_byrequests
Now to edit the Apache default site config to add the proxy configuration. Open /etc/apache2/site-available/000-default.conf file. Inside the root section <VirtualHost *:80> add the following:
######################
# ALFRESCO SHARE PROXY
<Proxy balancer://app>
BalancerMember ajp://localhost:8010/share route=tomcat1
BalancerMember ajp://localhost:8011/share route=tomcat2
</Proxy>
ProxyRequests Off
ProxyPassReverse /share balancer://app
ProxyPass /share balancer://app stickysession=JSESSIONID|jsessionid
######################
You may need to change the port and 'route' values if you aren't using exactly the same as me. Of course you can add more nodes here also if you wish. I also set:
ServerName localhost
to stop the various warnings on starting Apache. Now start Apache:
sudo service apache2 restart
The service should start cleanly, if there is an error instead then look here for info: cat /var/log/apache2/error.log as it may just be missing module dependencies. Start all the Share Tomcat instances - you will see them connect to each other in the Hazelcast INFO log e.g.
Nov 07, 2014 12:43:44 PM com.hazelcast.cluster.ClusterManager
INFO: [192.168.221.84]:5802 [slingshot]
Members [2] {
Member [192.168.221.84]:5801
Member [192.168.221.84]:5802 this
}
Now you can point your browser(s) at localhost/share directly. Behind the scenes Apache will automatically load balance out to one of the Share instances. The Share clustering magic will keep things in sync - try creating a site and modifying the dashboard configuration. Another use can immediately visit that dashboard and will see the same configuration, lovely. So, if a node goes down, any users attached to the node are logged out as they are bounced onto another node by Apache. It's great that the users can still access Share, but not so great that they get interrupted and have to log in again. We want to add something called Transparent Failover so the user is not aware of a server crash at all! With clustering, all the servers are the same and so the loss of a server should not interrupt the service.
Just two steps are needed to enable Session replication between our two Tomcat servers. For all nodes, edit tomcat/webapps/share/web.xml add the following element into the web-app section:
<distributable/>
Then for all nodes, enable the following section in the tomcat/conf/server.xml config:
<!--For clustering, please take a look at documentation at:
/docs/cluster-howto.html (simple how to)
/docs/config/cluster.html (reference documentation) -->
<Cluster className='org.apache.catalina.ha.tcp.SimpleTcpCluster'/>
Restart the Share nodes and now you will now see something like this:
Nov 10, 2014 11:21:45 AM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{127, 0, 1, 1}:4000,{127, 0, 1, 1},4000, alive=1009, securePort=-1, UDP Port=-1, id={-122 10 86 -116 -50 35 77 111 -70 -58 -34 49 -128 -95 -29 -111 }, payload={}, command={}, domain={}, ]
NOTE: You may need to clear the internet cache, delete cookies and restart browser instances to ensure clean startup the first time after making these changes. You may see odd behaviour if you don't do this. Login to Share with a couple of different browsers and examine the Cookies (using Chrome Developer Tools or FireBug etc.) to see what node it is currently attached too - you will see something like:
Name: JSESSIONID
Value: 5ACD598FD19B6C04FE7EECC1664B69C8.tomcat1
Host: localhost
Path: /share/
Then you can terminate tomcat1 and continue to use Share in that browser - the user experience continues without interruption. If you examine the cookies again you will something like this:
Name: JSESSIONID
Value: DB984B4A51FB30B5E14B5ED71B65CFD4.tomcat2
Host: localhost
Path: /share/
So Apache has switched the over to tomcat2 and because of the Session replication no logout occurs, nice!
This is a basic set-up and there are a lot of options to improve Tomcat replication: http://tomcat.apache.org/tomcat-7.0-doc/cluster-howto.html
For high-performance production systems there are better choices than raw Tomcat replication - for our Alfresco Cloud offering we use haproxy and memcache: https://www.alfresco.com/blogs/devops/2014/07/16/haproxy-for-alfresco-updated-fror-haproxy-1-5/
Finally here is another memcache and Tomcat example: https://wiki.alfresco.com/wiki/Tomcat_Session_Replication_with_Memcached