Hyland Connect

dirko · ‎01-14-2013

Hello,

we have a 2 node alfresco cluster with solr on a 3rd node. at the moment both alfresco nodes have the same solr-backup.properties config:

solr.backup.alfresco.cronExpression=0 0 2 * * ?
solr.backup.archive.cronExpression=0 0 4 * * ?
solr.backup.alfresco.remoteBackupLocation=${dir.root}/solrBackup/alfresco
solr.backup.archive.remoteBackupLocation=${dir.root}/solrBackup/archive
solr.backup.alfresco.numberToKeep=3
solr.backup.archive.numberToKeep=3

so both alfresco's try to make a solr backup at the same moment, and if this happens exactly in the same second this results in following exception on 1 of the nodes:

02:00:06,294 ERROR [org.quartz.core.JobRunShell] Job DEFAULT.search.alfrescoCoreBackupJobDetail threw an unhandled Exception:
java.lang.NullPointerException
at org.alfresco.repo.domain.locks.AbstractLockDAOImpl.updateLocks(AbstractLockDAOImpl.java:190)
at org.alfresco.repo.domain.locks.AbstractLockDAOImpl.releaseLock(AbstractLockDAOImpl.java:172)
at org.alfresco.repo.lock.JobLockServiceImpl$3.execute(JobLockServiceImpl.java:428)
at org.alfresco.repo.lock.JobLockServiceImpl$3.execute(JobLockServiceImpl.java:425)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:388)
at org.alfresco.repo.lock.JobLockServiceImpl.releaseLock(JobLockServiceImpl.java:432)
at org.alfresco.repo.search.impl.solr.SolrBackupClient.execute(SolrBackupClient.java:122)
at org.alfresco.repo.search.impl.solr.SolrBackupJob.execute(SolrBackupJob.java:58)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)

what would be the best practice for this setup:
- make 1 alfresco node responsible for the solr backup?
- have 2 backups at a different time?
- ?

cheers,
dirk

heikki · ‎03-28-2013

We have encountered similar issues with a same kind of clustered setup.

I could not get a clear "best practice" recommendation from the Alfresco enterprise support, since there are many different aspects to consider…

If you enable the backup only for a single node, then this creates a single point of failure for the cluster, i.e. the backups will not be generated if this node is down.

If you use the same backup schedule in both nodes, there is a possibility for both nodes trying to create backups at the same time. I have not seen the error you mentioned, but instead there might be an error in SOLR side since both nodes are trying to create the "same" backup directory.

My recommendation is to configure different backup schedules and locations for each node. When the cluster is working properly, you will have index backups that are maybe redunant, but at least you are not risking not getting backups generated at all.

One more thing to remember. If you configure different SOLR backup setting for each node in the cluster (in the alfresco-global.properties), then do not edit these settings from Share of JMx console. Editing the values will propagate the same settings to all the nodes in the cluster (and the settings are persisted to the DB). This means the custome settings in alfresco-global.prorties will not be used anymore. "Reverting" the settings via the JMx console will revert the value back to the default value and your custom setting will still not be used. The only way to recover from this is to manually hack the DB.

Cheers,
-Heikki-

colindstephenso · ‎07-31-2013

My understanding is the Solr backup job uses the JobLock service so this should guarantee only 1 node in the cluster will run the job.

http://svn.alfresco.com/repos/alfresco-open-mirror/alfresco/HEAD/root/projects/repository/source/jav...

Regards,

Hyland Connect

Cluster with single solr instance: solr backup howto?