Hyland Connect

josh_barrett · ‎07-11-2013

Our plan is to have Alfresco setup to take traffic in 2 seperate data centers. Multiple portal products are talking to Alfresco via CMIS. Alfresco is pretty much only accessed via CMIS.

In order to do this we need the files and DB to be replicated in each DC.

The DB replication in each DC will need be bidirectional using Oracle and GoldenGate. From what I can see we are going to need to have different sequences on each DB. Maybe set up the DB in data center 1 to use odd numbers in the sequence and data center 2 using even number.

We have NAS stoage where the files are being stored replicating. We are also planning on configuring replicating file storage on the alfresco side with each data center having a local attached primary store and the secondary store from the other data center. This should hopefully handle replication latency by checking if the document exists in its primary store when requested. If it doesn’t exist it will retrieve the file from its secondary store and copy the file into its primary store.

We are of course setting this up in our sandbox environment first but before we do I need to figure out answers to the following questions.

1. Has anyone out there set Alfresco up in this manner before? If so am I on the right path?

2. Would it be ok for us to update the Oracle sequences on each DB? Would this in anyway void our licensing agreement. We are using Alfresco 4.1.4 Enterprise, Oracle/GoldenGate, on SUSE Linux.

3. Also are the GUIDs generated in the DB guaranteed to be unique? If not we will have to configure GoldenGate to handle this collision.

4. Can anyone think of anything else I may be missing?

Any feedback would be greatly appreciated.

Thanks,
Josh

afaust · ‎07-13-2013

Hello,

this is quite an interesting project. Personally, I'd love to have an opportunity of working on a project such as this.

As Alfresco is not licensed in the classical sense, you won't be voiding any licensing agreements - basically you can run Alfresco on anything you want. What may / will be limited is the scope of the support services that you are entitled to as an Enterprise customer. The full range of services is only available for the officially supported platform stacks as listed on the Alfresco homepage. Support staff may decide to stop investigation / support on a specific ticket when it becomes clear that a custom, unsupported setup is the cause of the problem. From my experience, they are always very cooperative even in those situations and help you get back on the supported path.
I'd strongly advise to discuss your plans with Alfresco Support / your local Alfresco Sales Engineer.

What comes to mind at this moment in terms of feedback on the technical issues is the following:

<ol>
<li>Don't forget about the need for remote cache replication / invalidation. Alfresco keeps a lot of actively used data in-memory and needs to be told when another member of a clustered / replicated environment has made changes to the database. Otherwise you'll end up with stale data issues.</li>
<li>Alfresco UUIDs are generated based on a pseudo-random number generator (<a href="http://sourcecodebrowser.com/jug/2.0.0/classorg_1_1safehaus_1_1uuid_1_1_u_u_i_d_generator.html#a2924...">source / javadoc</a>) - they can not be guaranteed to be unique from a theoretical point, but may be considered to for most practical purposes</li>
<li>You may need to investigate the index tracking of SOLR / Lucene, which uses transaction IDs to track modifications. I'd expect it to handle unreplicated transactions similar to un-committed ones in a common cluster scenario…</li>
<li>I would not use a replicating content store only on the Alfresco side, especially with inbound-only replication as your posts suggests. When either DC goes down, you'll end up with unusable files on the other DC Alfresco instance when that instance has not yet replicated the content locally. Content-wise, I would rather use a DC independent central storage (with its own backup) or a full active-active replication on both DC storage environments. You can still use Alfresco caching / replicating content store as a kind of fallback / optimization layer, but the main replication should be handled independently (resillience concerning potential Alfresco Repository / JVM issues or crashes)</li>
</ol>

Regards
Axel

josh_barrett · ‎07-24-2013

Axel thanks for your feedback!

I have it configured in my development environment and so far so good.

I added/updated/deleted documents on both Alfresco instances without any issues.

Next I am going to run bulk imports on both Alfresco instances to see if I run into any issues.

-Josh

Hyland Connect

Alfresco Active-Active Configuration