cancel
Showing results for 
Search instead for 
Did you mean: 

Signalling process fails intermittently

wslade
Champ in-the-making
Champ in-the-making
We have been using the runtimeService signal method within a JMS queue receiver.

We have found over time that the execution we are signalling does not seem to always have been persisted. (Since looking at the code the signal lookup is simply a getDbSqlSession().selectById())

Initially we put a backoff and retry algorithm in place and found that typically a wait of 1 second and one or two retries was sufficient. However sometimes we find that we can reach up to 10 retries which is our max.

We dont wont to increase the retry limit as 10s seems perfectively reasonable especially since this is occuring in single use test machine and that this may be simply hiding a different problem.

When checking the activit database we do find the process persisted. However since there is not a "creation date" column it is not possible to find when the row was actually written. This would be a great addition to the schema.

So should we just increase our retry or look some where else for a related problem?
8 REPLIES 8

ronald_van_kuij
Champ on-the-rise
Champ on-the-rise
or look some where else for a related problem?

I think so… But without a minimal example of how you use things, it is hard to say…

mmaker1234
Champ in-the-making
Champ in-the-making
Hello  wslade,

I would suggest you to check the transactions boundaries - both the application, database, and process instance. Most probably there is some (time) difference in the transaction commits. Such a difference might mislead you to think that a retry could fix the problem. Usually the need of retry in simple cases is a symptom to time (transactions) racing conditions.

For example, we have a workflow process where one JavaServiceTask activity calls the business logic to send a JMS message, then the process continues to the next activity. The message should be processed by the business logic and at the end it should "notify" the process by signalling exactly that "next activity". What was our surprise when we started occasionally to receive notifications "Can not find activtiy '…' for process instance …"!  Smiley Surprised The message succeeded to pass through the external JMS - to be queued, delivered, received, and processed (with all transport delays) - before Activiti to manage to complete the first (sending) activity and to open the next one!  :shock:  And all this because the workflow engine (Activiti) and the business logic (the actual sending of the JMS message) were working in different transactions.  Smiley Wink

wslade
Champ in-the-making
Champ in-the-making
Thanks mmaker1234,

We send our JMS message with a service task, using an expression to a spring bean. We then transition to a "ReceiveTask" were we block waiting for the response.

We use a separate Transaction Manager for JMS and for activiti. We consume from the same JMS transaction manager.

So yes I do expect a race condition and would expect some variance. However 10s does smell bad.

Did you ever solve this problem?

trademak
Star Contributor
Star Contributor
Hi,

Is the service task where you send the JMS message a synchronous service task or did you use the async attribute?
Do you only send the JMS message to the queue and nothing else?
When Activiti reaches the receive task the transaction is committed, so a period of 10 seconds is really, really strange.
Can you provide more details about your process definition?

Best regards,

mmaker1234
Champ in-the-making
Champ in-the-making
Hello wslade,

Yes, we solved the problem - we put the workflow engine and the application (JMS message sender) in the same transaction. The message retriever is still working in a separate transaction.

That was what we needed 🙂

My example was only to point you where to look for potential problems. I didn't claimed that our situation is close to yours.

Here is another example (with another workflow engine but it doesn't matter): One and the same application is deployed on several servers (a couple of development workstations, several levels of test, etc.). In short: The user requests a report generation and the GUI waits up to one minute to display the result. If the report is not received within one minute the application displays a message that the result will be send by an e-mail. On the request a workflow process instance is created. The process sends a message, the application generates a report and notifies the process for the end of the generation. If more than a minute is passed after the message sending, the process initiates sending the result by an e-mail.

Sometimes, on some server all reports started to deliver by e-mail, despite the report generation itself took a couple of seconds. Our investigation demonstrated that in these rare cases JMS messages were processed (consumed) a couple of minutes (sometimes even five minutes) after their issuing. The application was not loaded, there were no other messages in the queue. Sometimes this behavior survived even a server restart, but later it fixes "automagically" - just a ghost in the system.

With this example I'm only demonstrating that you can not rely on time frames in asynchronous (JMS) communications.

I think that you should re-design your process definition considering all the aspects - not only of the business process but also the implementation (transactions, asynch. communication, etc.). If you need more help on this you should describe your requirements as thorough as possible. Otherwise anyone here is just guessing what you are aiming at and points you to different directions according to his/her guesses (actually fantasies, based on your pieces of information).

jbarrez
Star Contributor
Star Contributor
mmaker1234: Thanks for the explanation. Just wanted to say I appreciate such thorough replies 😉

wslade
Champ in-the-making
Champ in-the-making
I tried to attach our simple process but alas was foiled.
1. Can not upload a file with an extension, xml, or bmn2.0
2. Then when zip'd. Could not upload attachment to ./files/3599_6295aedc9fe637584c6967b15135470f.

Any way the process is very simple.
1. Using an expression send a JMS message. With default Task settings. (Ie synchronous).
2. Wait for a signal using a ReceiveTask

The objective of the process is very simple.
1. Send a JMS message to another application. (including the process identifier in the message)
2. Then wait for the other application to reply with a JMS message.  (which including the process identifier in the message)
The JMS consumer use the process identifier to signal the process.

From mmaker1234 last response - by using the same transaction manager for the JMS send and the workflow engine, you are basically delaying the sending of the JMS message until the process state has been persisted. This is not a bad solution. So we will give it a go.

mmaker1234
Champ in-the-making
Champ in-the-making
Hello wslade,

Just a hint: when you try our approach you will need to use distributed (XA) transactions.

And another hint (although not of much use): If you make the sending activity asynchronous, this will force Activiti to persist the execution before the message is sent. Unfortunately the problem described in my second post - the "waiting" activity is still not instantiated when the message (listener) tries to notify the process - still remains.