cancel
Showing results for 
Search instead for 
Did you mean: 

Clustered Workflow Servers

Michael_Butt1
Star Contributor
Star Contributor

I'm not sure if this is the right spot to ask, but I have a customer that runs a large number of business critical workflow timers. The timers perform various tasks including running scripts, exporting data, calling web services and interfacing with external database tables. So keeping these timers up and running is important to the customer.The timers are mostly core based using the Workflow Timer Service, but here are some thick client based timers as well.

We would like to use Windows Clustering in an active/passive configuration for the servers running these timers. I believe that means that the same timers and any local resources they interface with will have to be setup on both nodes.  Is this possible?

I thought I remembered that Workflow Timers could only be applied to one timer service or client instance, but my co-worker believes it is possible to cluster them.

12 REPLIES 12

Not applicable
Seth,Thank you for the information, good stuff.One last question for you. Is the OnBase Workstation subscription persisting while failing from one node to another or do you have multiple licenses and apply a separate workstation registration for each node?Thanks again

Seth_Yantiss
Star Collaborator
Star Collaborator

[quote user="Mike Carter"]Is the OnBase Workstation subscription persisting while failing from one node to another or do you have multiple licenses and apply a separate workstation registration for each node?

Mike,

I am not 100% sure that I followed your question correctly, but I think you're asking about the DIP service.  I have two DIP licenses and have assigned one to each server.  I would prefer (Obviously) to only pay for one license and have that license account for failover from one server to another.  But I am only using one server at a time for all my DIP's. **

If you are asking about workflow timers, I am not using a client for this, I am using the Workflow Timer Service.  

When MS Failover Clustering detects a fault with a service or the loss of a node, it takes steps to ensure that the presently hosting node takes down each of the services it is hosting, then brings up the "sister" services on the second node.  Failover Clustering does not run the service on two nodes at the same time.  Clustered services are supposed to be set to "Manual" and Failover Clustering decides when and where to spool the services up.

**As OnBase is an enterprise level application, there should be an allowance for disaster recovery and Hyland should be more proactive with their customers on making the system up-time more of a priority.  We did what we had to do to make OnBase a "Highly Available" application, but that requires us to 1) over purchase licenses and maintenance and 2) be creative and motivated to do what Hyland did not do out of the box.

Cheers,
Seth 

Joe_Pineda
Star Collaborator
Star Collaborator

Sounds like Seth has put a lot of work into this. But as I read it, I didn't get what benefit is gleaned from an OnBase Timer cluster. Basically, you still have to manually start the service, etc. He does have a good point about paying for extra licenses. But isn't  Microsoft Cluster licensing more expensive than buying a couple of DIP licenses or whatever, and the just have a "stand-by" server ready to be put into action, not necessarily as a Windows cluster?

Seth also has a good point about OnBase and its HA architecture. i just don't think OnBase, aside from the sql db, is really meant to be a cluster-based app.

Load balancing the web and app servers, for example, worked very well me in the past. And to improve HA, we just built more VM's, and split the timers among them. These were dedicated to just that purpose, and it meant that if one went down for one organization, others would still be running. Dedicated "timer" vm's don't need to be that beefy.

Seth_Yantiss
Star Collaborator
Star Collaborator

Jose,

Thank you for the agreement on some of the HA oversights!  I have a couple of responses to some of your comments that might be helpful.

[quote user="Jose Pineda"] But as I read it, I didn't get what benefit is gleaned from an OnBase Timer cluster. Basically, you still have to manually start the service, etc.

MS Failover Clustering automatically starts the service on the failover node when there is a degradation on the primary node.  So if the Primary Server dies the secondary fires up automatically.  This takes out the manual process and makes the outage as small as is possible in this day and age.  End users might not notice the difference.

[quote user="Jose Pineda"]But isn't  Microsoft Cluster licensing more expensive than buying a couple of DIP licenses

Probably...   but you don't want to have two ACTIVE DIP processors running the SAME DIP's at the same time...  That would be bad, in general.

[quote user="Jose Pineda"] And to improve HA, we just built more VM's, and split the timers among them.

I have found that running the Workflow Timer Service on two machines, where they run the same timers, is something of a problem.  You can get into document contention issues.  You can have redundancy in some process.  For example, if your documents drive the generation of a DOC COMP letter, you can end up with two of the same letter.  If your document drives an Export script, you can end up with two of the same documents at the destination... etc.

Cheers, Beer
Seth 

Joe_Pineda
Star Collaborator
Star Collaborator

Thanks Seth, but i think you misunderstood me.

I wasn't talking about having 2 active dip processors running the same dip jobs. I was just asking (since you bristled at having to buy extra licenses) if MS Clustering wouldn't be more expensive than buying the extra licenses for some "stand by", non-clustered server. The server would be "activated" manually. Your cluster seems to take care of this without intervention. Great. I would be very interested in knowing if this works when theOnBase service just 'hangs' but the server is still up.  In my experience, that's a more common scenario.

I admit that the approach we used for HA wasn't really HA in the sense that there was no redundancy for the timer servers.. we were just trying to minimize impact to the whole organization by separating the timers... different timers. That way even if one 'office' was down, the rest would still be up and running. We just didn't see a fail-over cluster as a the best approach. But I understand that's a debatable point. Bottom-line: it worked for us. Would it have been better with clustering... maybe...???