cancel
Showing results for 
Search instead for 
Did you mean: 

Clustered Workflow Servers

Michael_Butt1
Star Contributor
Star Contributor

I'm not sure if this is the right spot to ask, but I have a customer that runs a large number of business critical workflow timers. The timers perform various tasks including running scripts, exporting data, calling web services and interfacing with external database tables. So keeping these timers up and running is important to the customer.The timers are mostly core based using the Workflow Timer Service, but here are some thick client based timers as well.

We would like to use Windows Clustering in an active/passive configuration for the servers running these timers. I believe that means that the same timers and any local resources they interface with will have to be setup on both nodes.  Is this possible?

I thought I remembered that Workflow Timers could only be applied to one timer service or client instance, but my co-worker believes it is possible to cluster them.

12 REPLIES 12

Joe_Pineda
Star Collaborator
Star Collaborator

It may be technically possible to cluster them, but I don't think you'd want to. At least for the thick timer, it's just the OnBase client running as a service.

You can definitely run more than one instance of the client as a service on ONE server.

The challenge in clustering is a mirror of those instances also running on another server. Basically, whatever your timers are doing, it will be multiplied by 2:

For action X:

Timer 1 on Server one : sends notification abc

Timer 1 on Server 2: send notification abc - presumably at the same time as server one.

If you can keep the passive node from sending the request out to the network, then you would just have have a timer that's 'spinning', but doing nothing in workflow.

I just don't see this as a viable approach. Maybe Hyland can weigh in here...

 

Seth_Yantiss
Star Collaborator
Star Collaborator

Michael,

We do exactly what you're talking about using MS Windows Failover Clustering Service.  We set up a cluster with the Hyland Workflow Timer Service in the cluster...  Set the service to "Manual" on both nodes (servers) and let the system determine where the service runs.  

The problem is keeping them sync'd.  You had to make sure that when there is an addition or deletion of a timer from the lead server that the same thing is done on the other server.

What we do is add/remove the timer from the lead server, then fail the timer over to the other machine then add/remove the timer from that machine.

It's a little painful to remember to do that and you should do that (probably) after timer hours, which might be mid day...  or at 2 AM...  

In Fact I have some timers that need to be updated on my secondary as I type this....  good reminder...

Cheers,Beer
Seth 

Not applicable
Seth, just looking for clarification in your clustered implementation (I'm very curious on how this is working out for you).Can you share with us under what circumstances this deployment has been beneficial?For example,I can understand in the context of server maintenance, this approach could be valuable in minimizing system downtime.You are correct to point out this value in this context.Though I struggle to understand use-cases where there is an application level failure/hang/stop processing event that would properly trigger a failover from an active node to the passive node.Any information and/or clarity you are willing to share would be greatly appreciated.

Seth_Yantiss
Star Collaborator
Star Collaborator

Mike,

In my PROD environment we have two servers that are MS Failover Clustered.  I have seven SubServer Services, an OB Client running as a service for DIP processing, and the WF Timer Service.  All of these services are configured to run on Server 1 or Server 2.  I have split the services into two groups so that some of them default to Server 1 while others default to Server 2.  

In the event of an outage of one of the Servers (Patching causing a reboot, Server failure for some other reason, etc) the services that were running on the failed node will kick over to the server that is working.

Hyland doesn't make using Failover Clustering easy, I'll mention.  Setting up and maintaining DIP processing and Workflow timer services is a bit of a pain and takes some meticulous efforts to keep in a state of readiness for failover.  Subscription Server was easier, but you have to make sure that you set up the Subscriptions in the same order on each node since you cannot set the actual service name.

Cheers,
Seth