cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco 6.0 Replication - Best Practices required.

murthyhotha
Champ in-the-making
Champ in-the-making

Team,

Someone please help me in finding out the best practices for Handling Alfresco 6 replication, keeping below in mind.

  • Need to handle Content replication between 2 geographically located Alfresco servers on AWS.
  • Document Library Replication thru replication jobs - Clear about this and is working fine.
  • Need a strategy for DB(PostgreSQL) Replication handling users/groups and Document Revisions.
  • Can we have Content Replication and DB Replication same time...! coz, noticed Content replication inserting nodes DB, having both DB and Content Replication could cause any duplicate issues.

Any pointers ... much appreciated.

Thanks a lot in advance.

-Hotha.

8 REPLIES 8

kintu_barot
Star Collaborator
Star Collaborator

I don't know the purpose of your content replication. Replication jobs have certain limitations as it keeps the content as read-only at the targeted repository.

But as you have mentioned about the database transfer, I guess you want to set up a new instance of Alfresco using the existing instance.

If that is the purpose then the best way to do it is to back up and restore.

Take a backup of indexes, content stores, database and customizations you have made then restore these all to the newly installed instance.

There are standard backup and restore steps in the documentation http://docs.alfresco.com/6.1/concepts/ch-backup-restore.html

Thanks,

Kintu

ContCentric

Regards,
Kintu

murthyhotha
Champ in-the-making
Champ in-the-making

Thanks a lot Kintu Barot fo the reply.

Yes We want to create a new instance of alfresco in a different geographic region ( read only ) but the challenge is

the data should get replicated immediately ( with permissible latency ), including the revision history + Users / Groups created.

Thanks,

-Hotha.

Restoring the backup of the old instance will create the instance with all the data, users and groups.

Regards,

Kintu

ContCentric

Regards,
Kintu

heiko_robert
Star Collaborator
Star Collaborator

Hotha,

are you expecting to run / to login to the second, replicated Alfresco system? If not you make sure to replicate your alf_data/contentstore server directory (rsync jobs, block based replication, ...) and to set up a db replication. You should not start the second Alfresco system if you don't know what you're doing since you may corrupt your repository db.

If you intend to use both Alfresco installations on the same time (even in "read-only" mode) you need something like Alfresco cluster technology which is not part of the Community Edition (but there is a commerical cluster offer for the CE and a specific geo repo replication solution for EE).

Alfresco Replication is not a real replication but a copy job which creates a new independant node in the second system. The copied node has no metadata and a different node id). Alfresco Replication is not designed as a desaster recovery solution. We implemented a more sophisticated solution which is a more or less one way async Server2Server sync wich also replicates defined metadata but this is maybe also not what you expect?

If you expect to run more than one Alfresco system in an active mode clustering is not the perfect solution because it will slow your whole system down dramatically if not running in a low latency network. Instead you would need an async services which supports a locking mechanism respected from all involved systems to avoid conflicts which are very hard to handle. Unfortunately there is no such solution available I know of but maybe some will correct me?

Hotha,

could you please describe the solution and requirements more in detail, what your customer / your users expect?

murthyhotha
Champ in-the-making
Champ in-the-making

Thanks a lot for the advise Heiko Robert.

Here is my problem statement. 

We have alfresco-60 setup complete and functioning in production well.

Now we need to build a kind of replication system to the above,  on a different geographical location.

Now system should function  like...! 

  • Users from new Geo location will upload the documents or any write operations on the Prod 1 which is already working fine.
  • All the read operations from same user from new Geo location should go to newly replicated system - which we need to find out how to build.
  • Precisely , upload the document in Prod 1 environment, replicate the content + DB on Prod 2 and read the data from Prod2.

I hope am clear. Any suggestions will help me a lot.

Thanks,

-Hotha.

Hi Hotha,

the scenario you describe is not officially supported by Alfresco.

The Alfresco Partner IT-Novum has an addon for the Enterprise Editon which should cover your use case: Alfresco Geo-Caching by it-novum | Alfresco 

If you need this functionallity for the community edition you may:

  • set up bi directional DB replication or you connect to Prod1's DB remote on Prod
  • ask a Consultant to configure your read only Alfresco Server with disabled cache (to see changes done directly in the DB) and to force a pseudo read-only configuration (Alfresco requires always rw access to the db)
  • setup an distributed filesystem like openAFS supporting a local cache on your remote site

Another / easier setup would be to use our (ecm4u's) Alfresco Server2Server Sync module which implements an unidirectional, query based sync to copy/modify/delete nodes from one site to another. The logic is similar to rsync but on top of the Alfresco-API. We support types, aspects and metadata but no versions, comments.The documents in the target will get new nodeRefs. The second system doesn't need to run in read only mode but the synced directories shouldn't have write permissions for the users on site B.

murthyhotha
Champ in-the-making
Champ in-the-making

Also please clarify, if at all the alfresco clustering can be a solution, can I have 2 clusters located on different geographic locations on Amazon Cloud.

Do we anticipate any architectural issues here...? 

Thanks,

-Hotha.

Alfresco Clustering is a bad idea for WAN / high latency networks and will also not care about distribution/replication. It may work if your network is fast enough to use the repository DB from site Prod1 (no db replication). If not you may get in trouble with the timelag of the db replication since the Alfresco cluster is more or less a cluster of the Cache. So if Cache is newer than the DB you will get "unexpected" behavior ...