cancel
Showing results for 
Search instead for 
Did you mean: 

Lucene Indexes - High Availability & DR

unknown-user
Champ on-the-rise
Champ on-the-rise
Hi,

We are planning to setup a high availability and Disaster recovery environment for our Alfresco instance in Production. For HA, we are going to have 2 Alfresco instances (load balanced) talking to a shared database and content store (Primary site). And the DR site will be a similar environment running paralelly to the Primary site. Users wont have have access unless and until there is a disaster.

We are also planning to keep a "copy" of the shared database and content store in our primary site just to guard against any failures to the shared database and content store. The data & content will be replicated to this "copy" using database replication and "rsync" respectively. Replication will be synchronours, but we are not sure how to handle lucene indexes. Will it be fine if we copy the backup-lucene-indexes also and store it along with the "copy" of the content store in the Primary site, so that we can avoid a full re-indexing incase of a failover. The backup-lucene-index will be copied once a day after the out of the box daily lucene index backup is done.

Does anyone see any issues with this kind of a setup?

Thanks in advance
6 REPLIES 6

mrogers
Star Contributor
Star Contributor
That sounds fine.

unknown-user
Champ on-the-rise
Champ on-the-rise
The plan has changed and now we are having only the DR environment. DR environment is planning to be implemented as follows.

a) The database will be replicated instantaneously from the Primary database to the DR database through MySQL replication. DR server's DB will be always updated.
b) Content store + backup-lucene-index directory will be rsync 'ed to the DR server's content store every 1 hour. backup-lucene-index will be renamed to lucene-index in DR server.
c) lucene index backup schedule will be 3AM everyday
c) DR server will be in a stopped mode to enable rebuilding the indexes from the last backup-lucene-index.
d)During a failover, DR server will be brought up and indexes will be re-built. The delta content store (binary content that got added after the last rsync) will be manually rsync'd to DR server's content store.

We tested a similar setup and found that it works. Only issue we noticed is that the content that got added after the last rsync were not accessible and downloadable though we are able to see/search them through Web Client. But after we copy the delta content store from primary server, we could access/download them as well.

Can you please let us know if this kind of a setup would work when we have to failover during a disaster?

Thanks in advance

mrogers
Star Contributor
Star Contributor
Again - should be O.K. 

However the sequence should be database first, then the content store.   As you have seen when the database is in advance of the content store you have "missing" content.   But at least you only have one hour's missing content

I presume your content store is resilient against failure even though there's a DR backup ?

unknown-user
Champ on-the-rise
Champ on-the-rise
Hi,

Thanks for your prompt replies, mrogers

We have one more question on the DB replication. Our DBA's have setup a replication loop between the primary server db and the DR server DB's. i.e primary db => dr db => primary db.

Any changes on the DR db will also be replicated to primary DB. Earlier, we had noticed issues related to Alfresco license at the server start-up when we had this kind of a setup. The secondary server used to start properly but after that, if we try to start the primary server, it used to fail complaining about an invalid license.

Has anybody faced this kind of an issue? Is Alfresco updating any tables during the start-up which gets replicated to the primary servers DB?

Regards,
jjacob

unknown-user
Champ on-the-rise
Champ on-the-rise
For anyone who might be interested in setting up an Alfresco DR env, following are few things that we did.

a) Content store replication through lsyncd (this provides a near real time synch)
b) Circular MySQL replication for metadata
c) Backup Lucene index gets rsync'd once in a day. This ensures that the indexes are built only from the last backup lucene index during a failover.

Thanks mrogers for sharing your thoughts on this

mrwilkinson
Champ in-the-making
Champ in-the-making
Hi mrogers,

Apologies for opening an old thread if this is not forum etiquette. Our queries relate to earlier posts in this thread.

We have an application which has been developed by a 3rd party software house which uses Alfresco Community Edition and which we are currently planning the backup and DR strategy for.

We are intending to have a DR server (both primary and DR would be virtual servers) which we would:

1) replicate the Lucene indexes across to (by refreshing the Lucene Index backup every 24 hours and replicating across using Superflexible or similar tool and renaming back)
2) mirror the MySQL database changes to in real-time through Master-Slave mirroring; and
3) Replicate the contentstore across to in real-time (agin using Superflexible or similar tool). Threads above appear to match this process.

From reading the documentation it clearly outlines that for a Hot Backup Procedure, backups should be performed in the order Lucenes Indexes - MySQL DB - ContentStore, but the setup outlined would result in the database and the contentstore being replicated at the same time effectively.

Is there an issue with this or is this only relevant to the creation of backup files for restore purposes? Also, can you envisage any issue with the process we have outlined?

Finally, will using MySQL in Master/Slave setup cause any issues with database keys etc as in the event of a DR situation, we would then need to schedule in a maintenance window to restore the slave database back to live and reconfigure as the Master?

N.B. We have the option of using VEEAM to keep a few recent backups of the primary server instead of using a cold standby DR server but these would clearly be x number of hours out of date whereas the above setup would be up-to-date as of the fault. Would you agree or would VEEAM be a better option from your perspective?

Many thanks in advance for any help/guidance.