Since this post was published there has been a HAProxy 1.5(.x) release version, so this post is now out of date.An updated post with the changes relevant to HAProxy 1.5 can be are here: https://www.alfresco.com/blogs/devops/?p=8-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------For the cloud service we (Alfresco DevOps) used to use apache for all our load balancing and reverse proxy use, but more recently we switched to use HAProxy for this task.In this article I'll list some of the settings we use, and give a final example that could be used (with a bit of environment specific modifications) for a general Alfresco deployment.The main website for HAProxy is: http://haproxy.1wt.eu/The docs can be found here: http://cbonte.github.io/haproxy-dconv/configuration-1.5.htmlI suggest that for any of the settings covered in the rest of this article, the HAProxy docs are consulted to gain a deeper understanding of what they do.The 'global' section:global
pidfile /var/run/haproxy.pid
log 127.0.0.1 local2 info
stats socket /var/run/haproxy.stat user nagios group nagios mode 600 level admin
A quick breakdown of these:
- global - defines global settings.
- pidfile - Writes pids of all daemons into file <pidfile>.
- log - Adds a global syslog server. Optional
- stats socket - Sets up a statistics output socket. Optional
The 'defaults' section:defaults
mode http
log global
A quick breakdown of these:
- defaults - defines the default settings
- mode - sets the working mode to http (rather than tcp)
- log - sets the log context
Now we configure some options that specify how HAProxy works, these options are very important to get your service working properly:option httplog
option dontlognull
option forwardfor
option http-server-close
option redispatch
option tcp-smart-accept
option tcp-smart-connect
These options do the following:
- option httplog - this enables logging of HTTP request, session state and timers.
- option dontlognull - disable logging of null connections as these can pollute the logs.
- option forwardfor - enables the insertion of the X-Forwarded-For header to requests sent to servers.
- option http-server-close - enable HTTP connection closing on the server side. See the HAProxy docs for more info on this setting.
- option redispatch - enable session redistribution in case of connection failure, which is important in a HA environment.
- option tcp-smart-accept - this is a performance tweak, saving one ACK packet during the accept sequence.
- option tcp-smart-connect - this is a performance tweak, saving of one ACK packet during the connect sequence.
Next we define the timeouts - these are fairly self-explanatory:timeout http-request 10s
timeout queue 1m
timeout connect 5s
timeout client 2m
timeout server 2m
timeout http-keep-alive 10s
timeout check 5s
retries 3
We then configure gzip compression to reduce the amount of data being sent across the wire - I'm sure no configuration ever misses out this easy performance optimisation:compression algo gzip
compression type text/html text/html;charset=utf-8 text/plain text/css text/javascript application/x-javascript application/javascript application/ecmascript application/rss+xml application/atomsvc+xml application/atom+xml application/atom+xml;type=entry application/atom+xml;type=feed application/cmisquery+xml application/cmisallowableactions+xml application/cmisatom+xml application/cmistree+xml application/cmisacl+xml application/msword application/vnd.ms-excel application/vnd.ms-powerpoint
The next section is some error message housekeeping. Change these paths to wherever you want to put your error messages:errorfile 400 /var/www/html/errors/400.http
errorfile 403 /var/www/html/errors/403.http
errorfile 408 /var/www/html/errors/408.http
errorfile 500 /var/www/html/errors/500.http
errorfile 502 /var/www/html/errors/502.http
errorfile 503 /var/www/html/errors/503.http
errorfile 504 /var/www/html/errors/504.http
Now we have finished setting up all our defaults, we can start to define our front ends (listening ports).We first define our frontend on port 80. This just does a redirect to the https frontend:# Front end for http to https redirect
frontend http
bind *:80
redirect location https://my.yourcompany.com/share/
Next we define our https frontend which is where all traffic to Alfresco is handled:# Main front end for all services
frontend https
bind *:443 ssl crt /path/to/yourcert/yourcert.pem
capture request header X-Forwarded-For len 64
capture request header User-agent len 256
capture request header Cookie len 64
capture request header Accept-Language len 64
We now get into the more 'fun' part of configuring HAProxy - setting up the acls.These acls are the mechanism used to match requests to the service to the appropriate backend to fulfil those requests, or to block unwanted traffic from the service. I suggest that if you are unfamiliar with HAProxy that you have a good read of the docs for acls and what they can achieve (section 7 in the docs).We separate out all the different endpoints for Alfresco into their own sub-domain name, e.g. my.alfresco.com for share access, webdav.alfresco.com for webdav, sp.alfresco.com for sparepoint access.I'll use these three endpoints in the examples below, using the following mapping:
- Share - my.yourcompany.com
- Webdav - webdav.yourcompany.com
- Sharepoint - sp.yourcompany.com
We first set up some acls that check the host name being accessed and match on those. Anything coming in that doesn't match these won't get an acl associated (and therefore won't get forwarded to any service).# ACL for backend mapping based on host header
acl is_my hdr_beg(host) -i my.yourcompany.com
acl is_webdav hdr_beg(host) -i webdav.yourcompany.com
acl is_sp hdr_beg(host) -i sp.yourcompany.com
These are in the syntax:acl acl_name match_expression case_insensitive(-i) what_to_matchSo, acl is_my hdr_beg(host) -i my.yourcompany.com states:
- acl - define this as an acl.
- is_my - give the acl the name 'is_my'.
- hdr_beg(host) - set the match expression to use the host HTTP header, checking the beginning of the value.
- -i - set the check to be case insensitive
- my.yourcompany.com - the value to check for.
We then do some further mapping based on url paths in the request using some standard regex patterns:# ACL for backend mapping based on url paths
acl robots path_reg ^/robots.txt$
acl alfresco_path path_reg ^/alfresco/.*
acl share_path path_reg ^/share/.*/proxy/alfresco/api/solr/.*
acl share_redirect path_reg ^$|^/$
These do the following:
- acl robots - checks for a web bot harvesting the robots.txt file
- acl alfresco_path - checks whether the request is trying to access the alfresco webapp. We block direct access to the Alfresco Explorer webapp so you can remove this check if you want that webapp available for use.
- acl share_path - We use this to block direct access to the Solr API.
- acl share_redirect - this checks whether there is any context at the end of the request (e.g. /share)
We next add in some 'good practice' - a HSTS header. You can find out more about HSTS here: https://www.owasp.org/index.php/HTTP_Strict_Transport_SecurityNote, my.alfresco.com is in the internal HSTS list in both Chrome and Firefox so neither of these browsers will ever try to access the service using plain http (see http://www.chromium.org/sts).# Changes to header responses
rspadd Strict-Transport-Security:\ max-age=15768000
We next set up some blocks, you can ignore these if you don't want to limit access to any service. The example below blocks access to the Alfresco Explorer app from public use via the 'my.yourcompany.com' route. These use matched acls from earlier, and can include multiple acls that must all be true.# Blocked paths
block if alfresco_path is_my
Now we redirect to /share/ if this wasn't in the url path used to access the service.# Redirects
redirect location /share/ if share_redirect is_my
Next we set up the list of backends to use, matched against the already defined acls.# List of backends
use_backend share if is_my
use_backend webdav if is_webdav
use_backend sharepoint if is_sp
Then we set up the default backend to use as a catch-all:default_backend share
Now we define the backends, the first being for share:backend share
On this backend, enable the stats page:# Enable the stats page on share backend
stats enable
stats hide-version
stats auth <user>:<password>
stats uri /monitor
stats refresh 2s
The stats page gives you a visual view on the health of your backends and is a very powerful monitoring tool.option httpchk GET /share
balance leastconn
cookie JSESSIONID prefix
server tomcat1 server1:8080 cookie share1 check inter 5000
server tomcat2 server2:8080 cookie share2 check inter 5000
These define the following:
- backend share - this defines a backend called share, which is used by the use_backend config from above.
- option httpchk GET /share - this enables http health checks, using a http GET, on the /share path. Server health checks are one of the most powerful feature of HAProxy and works hand in hand with tomcat session replication to move an active session to another server if the server your active session on fails healthchecks.
- balance leastconn - this sets up the balancing algorithm. leastconn selects the server with the lowest number of connections to receive the connection.
- cookie JSESSIONID prefix - this enables cookie-based persistence in a backend. Share requires a sticky session and this also is used in session replication.
- server tomcat1 server1:8080 cookie share1 check inter 5000 - this breaks down into:
- server - this declares a server and its parameters
- tomcat1 - this is the server name and appears in the logs
- server1:8080 - this is the server address (and port)
- cookie share1 - this checks the cookie defined above and if matched routes the user to the relevant server. The 'share1' value has to match the jvmroute set on the appserver for Share/Alfresco (for Tomcat see http://tomcat.apache.org/tomcat-7.0-doc/cluster-howto.html)
- check inter 5000 - this sets the health check, with an inter(val) of 5000 ms
Define the webdav backend.Here we hide the need to enter /alfresco/webdav on the url path which gives a neater and shorter url needed to access webdav, and again we enable server health checking:backend webdav
option httpchk GET /alfresco
reqrep ^([^\ ]*)\ /(.*) \1\ /alfresco/webdav/\2
server tomcat1 server1:8080 check inter 5000
server tomcat2 server2:8080 check inter 5000
Define the SPP backend.Here we define the backend for the sharepoint protocol, again with health checks:backend sharepoint
balance url_param VTISESSIONID check_post
cookie VTISESSIONID prefix
server tomcat1 server1:7070 cookie share1 check inter 5000
server tomcat2 server2:7070 cookie share2 check inter 5000
Once this is all in place you should be able to start HAProxy. If you get any errors you will be informed on which lines of the config these are in. Or, if you have HAProxy as a service, you should be able to run 'service haproxy check' to check the config without starting HAProxy.There are many more cool things you can do with HAProxy, so give it a go and don't forget to have a good read of the docs!