This post assumes reasonable sys-admin Alfresco knowledge and assumes you are already familiar with setting up the Alfresco Repository in a cluster configuration and familiar configuring an Apache web-server instance. You should read the previous post first.
Since my last blog on this subject, there has been quite a bit of interest in load balancing Alfresco Share. This is good news but also means that customers and the community have found some issues that needed looking at - good, because more use of Alfresco testing means more stability for everyone when we fix the issues - that's part of the fun of having a very active user community!
Three main points were raised:
I'll address all issues here and share some exciting performance related to item 3 also!
1. A bug that manifested when a user changed the template layout selection for a dashboard - for example from 2 to 3 column layout. The problem was that SpringSurf PageView objects were internally caching the Page object rather than just the PageId - easily and quickly fixed.
2. Config for load balancing Share is very similar to that for load balancing Alfresco.
Set up two tomcat instances containing 'share.war' webapp with the 'share-config-custom.xml' and 'custom-slingshot-application-context.xml' config as detailed in the previously post. Remember the ports exposed by Tomcat need tweaking if you have those instances on the same physical machine - increment the HTTP and AJP and redirect ports in the tomcat/conf/server.xml config. Also ensure you have set the 'jvmRoute' attributes to different values ready for the load balancing config, I used 'tomcat3' and 'tomcat4' as I now have a lot of servers running on a single box!
Create another Apache instance, I just did a copy and paste from the one I used to load balance the Alfresco cluster. Again bump the Apache listener port value if it exists on the same physical machine. Finally configure Apache 'httpd.conf' to load balance against your Share web-tier instances:
# Reverse Proxy Settings (Share multi-instance load balancing)ProxyRequests OffProxyPassReverse /share balancer://appProxyPass /share balancer://app stickysession=JSESSIONID|jsessionid nofailover=On<Proxy balancer://app>BalancerMember ajp://localhost:8019/share route=tomcat3BalancerMember ajp://localhost:8024/share route=tomcat4</Proxy>
Simply point your client browsers at your new Apache instance. If you have set up your Share instances to themselves use an Apache which is load balancing against an Alfresco cluster then you now have a full 2x Alfresco Cluster + Apache + 2x Alfresco Share + Apache set up!
This is great - BUT it leads onto point 3...
3. Scalability and fail-over capability has improved by having multiple Share instances - but individual performance per Share node is reduced. Now you may consider (as I did first) that this is expected, there is after all some additional work here going on - in the case of the Alfresco cluster it's inter-node communication overhead, and in the case of the Share nodes it is reduced performance due to the caches that have been disabled. What's apparent here is that if you just cluster Alfresco and keep to a single Share instance, we find that a single instance of Share can easily service a 4 node Alfresco cluster so in practice there is little need to load balance Share for performance reasons - but you certainly might want to for high availability reasons. Perhaps a price worth paying for the ability to remove and drop in additional nodes without your users knowing or having to update their URLs... But it would be nice if there wasn't such a noticeable performance drop in Share.
The good news is that this has all changed in Alfresco 3.4.8/4.0.1 - in response to the community blog post and our drive to continually improve the performance of Alfresco, a new clustering technique has now been implemented for the web-tier.
For a load balanced environment, Alfresco Share now uses Hazelcast to provide multicast messaging between web-tier nodes. The end result of this is that all caches are now enabled again for each node, and we send very simple cache invalidation message when appropriate to all nodes. So the performance degradation is gone - each node is as fast a single Share instance.
The only changes required for each node are in “custom-slingshot-application-context.xml” – generally located in \tomcat \shared\classes\alfresco\web-extension and used to override the Spring application context beans for Share. There is an example “custom-slingshot-application-context.xml.sample” provided in the Alfresco distribution which now includes this config.
Enable this section on each Share tomcat instance to enable the Hazelcast cluster messaging:
<!-- Hazelcast distributed messaging configuration - Share web-tier cluster config (3.4.8 and 4.0.1) - see http://www.hazelcast.com/docs.jsp - and specifically http://www.hazelcast.com/docs/1.9.4/manual/single_html/#SpringIntegration --> <!-- Configure cluster to use either Multicast or direct TCP-IP messaging - multicast is default --> <!-- Optionally specify network interfaces - server machines likely to have more than one interface --> <!-- The messaging topic - the 'name' is also used by the persister config below --> <hz:topic id='topic' instance-ref='webframework.cluster.slingshot' name='slingshot-topic'/> <hz:hazelcast id='webframework.cluster.slingshot'> <hz:config> <hz:group name='slingshot' password='alfresco'/> <hz:network port='5801' port-auto-increment='true'> <hz:join> <hz:multicast enabled='true' multicast-group='224.2.2.5' multicast-port='54327'/> <hz:tcp-ip enabled='false'> <hz:members></hz:members> </hz:tcp-ip> </hz:join> <hz:interfaces enabled='false'> <hz:interface>192.168.1.*</hz:interface> </hz:interfaces> </hz:network> </hz:config> </hz:hazelcast> <bean id='webframework.slingshot.persister.remote' class='org.alfresco.web.site.ClusterAwarePathStoreObjectPersister' parent='webframework.sitedata.persister.abstract'> <property name='store' ref='webframework.webapp.store.remote' /> <property name='pathPrefix'><value>alfresco/site-data/${objectTypeIds}</value></property> <property name='hazelcastInstance' ref='webframework.cluster.slingshot' /> <property name='hazelcastTopicName'><value>slingshot-topic</value></property> </bean> <bean id='webframework.factory.requestcontext.servlet' class='org.alfresco.web.site.ClusterAwareRequestContextFactory' parent='webframework.factory.base'> <property name='linkBuilderFactory' ref='webframework.factory.linkbuilder.servlet' /> <property name='extensibilityModuleHandler' ref='webscripts.extensibility.handler' /> <property name='clusterObjectPersister' ref='webframework.slingshot.persister.remote' /> </bean>
The config enables the Hazelcast Spring integration which starts the Hazelcast server, it is easily configurable and can use either multicast (the default and minimal effort) or TCP-IP direct if preferred. See http://www.hazelcast.com/docs.jsp for more info. For the default set up, identical config can be applied to each Share node and it will 'just work'.
When you start Share you'll see something like this:
INFO: /192.168.2.8:5801 [slingshot] Hazelcast 1.9.4.6 (20120105) starting at Address[192.168.2.8:5801]
19-Jan-2012 13:58:57 com.hazelcast.system
INFO: /192.168.2.8:5801 [slingshot] Copyright (C) 2008-2011 Hazelcast.com
19-Jan-2012 13:58:57 com.hazelcast.impl.LifecycleServiceImpl
INFO: /192.168.2.8:5801 [slingshot] Address[192.168.2.8:5801] is STARTING
19-Jan-2012 13:58:59 com.hazelcast.impl.MulticastJoiner
INFO: /192.168.2.8:5801 [slingshot]
Members [1] {
Member [192.168.2.8:5801] this
}
19-Jan-2012 13:58:59 com.hazelcast.impl.management.ManagementCenterService
INFO: /192.168.2.8:5801 [slingshot] Hazelcast Management Center started at port 5901.
19-Jan-2012 13:58:59 com.hazelcast.impl.LifecycleServiceImpl
INFO: /192.168.2.8:5801 [slingshot] Address[192.168.2.8:5801] is STARTED
This means the config has driven the initialisation of Hazelcast successfully. That's all there is to creating a Share instance in the cluster, if the config is present it will become a cluster node, if the config is not present (such as for a default install) then Hazelcast never starts . Once each node is started they will find each other automatically. Then once you users interact with Share, only when the following operations occur will cache invalidation messages will be sent from the affected node to the others in the cluster:
This keeps chatter to a minimum and performance up!