Hyland Connect

lmollea · ‎04-15-2014

Hi all.

Has anyone experienced processes becoming stuck due to missed signals caused by Activiti engine write times? If yes, any suggestion on how to handle this situation at best?

Scenario is this: we have process that is in a wait state (user task or signal catch handler). This process gets woken up, execute a Java Service task, moves to another wait state (another signal handler) where it will wait for an external signal (say received via a JMS message).

What is happening is that the Java Service task calls a remote service who will then send the JMS message back that will wake up the process. We're experiencing that after the task is executed, Activiti is updating the various tables (execution, history, …) but the remote service may be way faster (not the normal case but some edge cases that can happen) and the Process Engine receves the response message before Activiti has committed the first transaction and thus the process is not woken up again.
We solved calling signalProcess and not signalProcessAsync and forcing one message read from the JMS queue at a time. This is a bit dangerous as it serializes all our operations creating latencies in processing messages (something that is not currently an issue but that it's undesirable).
Plus this forces us to go single-instance as clustering the process engine would simply create again the issue if JMS messages are round-robin'd among the instances.

We're talking about times in the range of 100ms here. Activiti takes about 100ms to commit the transaction while the remote service sends the reply message in about 30-50ms. Transaction is not committed so the JMS message sees the "old" state in the database and misses the process waiting on the new signal.

First thing that comes to mind is that Activiti seems issuing a query at a time to the database, effectively losing time for all the round trips. On an average, it's issuing about 20 queries (or more) and thus losing approx 3ms per query in round trips. Can't say if ibatis can use batched queries (not much experience about it), but could be interesting if it could, so that those times will be reduced. This doesn't eliminate the issue but will lower much more the time window (and the chance) in which this could happen.

Besides, I'd like to know if there may be a better way to handle this delicate issue without serializing processing like we did.

THanks

trademak · ‎04-16-2014

Okay. Do you have the history level of Activiti set to full? Lowering the history level would increase the performance of the Engine quite a bit. Is the database running a different server? So do you have network latency as well? Is there a specific query that takes the longest time?
It would also be an option to add a retry in the process instance signalling logic.

Best regards,

lmollea · ‎04-16-2014

History was set to default; lowered it to "activity" but there was no noticeable change in the number of queries issued in that transition: from 15 insert, 5 update, 8 delete, to 14 insert, 3 update, 8 delete.

Disabling history completely may not be feasible, we still plan on using the activiti history tables for monitoring purposes.

Databse tables are pretty empty, system is still under development and not released in production. Queries seems all performing similarly (2-3 ms each).
Think in the end all comes down to the number of network round-trips: Activiti does about 25 while the remote service invoked by the Service Task does (iirc) 2 queries on that call and there are then 2 more network round trips for the JMS message. So it's a 25 vs. 4 round trips, hard to beat…

Moving Activiti database locally (using embedded h2 for example) could probably make the issue highly improbable, but I don't know if it can work as we may need to cluster the environment in production.

jbarrez · ‎04-22-2014

> Moving Activiti database locally (using embedded h2 for example) could probably make the issue highly improbable, but I don't know if it can work as we may need to cluster the environment in production.

I wouldn't do that. H2 is not a production db.

Hyland Connect

Activiti transaction commit time