cancel
Showing results for 
Search instead for 
Did you mean: 

Synchronization issue in parallel multi-instance call activity

alexander_tsvet
Champ in-the-making
Champ in-the-making
Hi guys,

I am working on a task, where I need to have a multi-instance callActivity, which iterates over a collection and is executed in parallel. However, I may have found an issue in the synchronization of the threads that execute the started sub-processes, because in most of my tests, two threads try to update the values of nrOfCompletedInstances and nrOfActiveInstances at the same time. When this happens, these two variables are left with wrong values.

For example, I found the following lines in Activiti's debug logs (they are also attached):

11:17:53,623 [pool-1-thread-2] Multi-instance 'Activity(callactivity1)' instance completed. Details: loopCounter=1, nrOrCompletedInstances=1,nrOfActiveInstances=7,nrOfInstances=8
11:17:53,626 [pool-1-thread-1] Multi-instance 'Activity(callactivity1)' instance completed. Details: loopCounter=0, nrOrCompletedInstances=1,nrOfActiveInstances=7,nrOfInstances=8

As you can see, these two threads try to update the variables at nearly the same time and the second one sets wrong values for them. Instead of setting 2 for nrOfCompletedInstances and 6 for nrOfActiveInstances, it sets 1 and 7, respectively.

I looked around the Activiti engine's source code and the issue seems to be in the org.activiti.engine.impl.bpmn.behavior.ParallelMultiInstanceBehavior's leave() method. There, the values of nrOfCompletedInstances and nrOfActiveInstances are retrieved from the execution, incremented (or decremented) and then set back into the execution. These two operations are not in a synchronized block, however, nor are they in a separate DB transaction. This probably leads to the following situation:

1) Thread 1 retrieves the variables and they have values: nrOfCompletedInstances = 0, nrOfActiveInstances = 8.
2) Thread 2 retrieves the variables, before Thread 1 updates them and they have the same values: nrOfCompletedInstances = 0, nrOfActiveInstances = 8.
3) Both of them modify the variables independently, resulting in the values: nrOfCompletedInstances = 1, nrOfActiveInstances = 7.
4) Both of them attempt to set them back into the execution with these wrong values.

Is this a known issue and if so, have you made any plans for fixing it soon (I can try to submit a pull request)? Should I create an issue in Activiti's JIRA?

Note: The attached maven project contains my BPMN diagrams (I attached it as a .txt file, since the forum does not allow ZIP files *pardon*).

Thanks and best regards,
Alexander
1 ACCEPTED ANSWER

Hi Alexander,
I think you may have actually stumbled across an edge case that is not handled properly.
The intended behavior of "exclusive" is (as you already understand) to prevent concurrent execution within a single process instance. It's very useful for parallel joins which can lead to pre-emptive DB lock contention and can also be useful in multi-instance scenarios.

Looking at your model, I see you are already using Exclusive and Async on the Sub Process Call, I doubt this will have any effect since Called Activities (sub Processes) are their own individual process. Async may help a little since the parallel sub processes may not get executed simultaneously, but there is really no guarantee of this.

I agree with your analysis that the retrieval and updating of the nrOfCompletedInstances variable in the leave method of the behavior class should be in a synchronized block. For the majority of scenarios, the sub-process instances will not complete at exactly the same time, so there will be little if any performance impact, however for your scenario, it will resolve the issue.

Can you go ahead and create a Jira and then log the Jira number back into the forum for reference?

Thanks for your patience.

 Regads,

Greg

View answer in original post

5 REPLIES 5

alexander_tsvet
Champ in-the-making
Champ in-the-making
I forgot to mention, that this leads to some weird behavior. For example, in my test project, the parent process sets a "messages" variable containing 8 strings (the numbers from 1 to 8 represented as strings). Then it starts a sub-process (via a call activity) for each of the strings. These sub-processes then print the string, for which they were started. Normally, this should result in an output similar to the following:
2 1 3 4 5 6 7 8

However, it looks like Activiti is executing some of the sub-processes a second or even a third time, because in each run, it prints three or four more messages (which appear to be chosen at random):
2 1 3 4 5 6 7 8 1 4 4

This is a problem for me, since in my real project, these sub-processes should do network calls, which should not be repeated.

warper
Star Contributor
Star Contributor
Try adding asynchronous exclusive step at the end of subprocess. So work will be done in parallel, results persisted, and then executions will close one by one.

Repeated calls are due to transaction nature of processes engine.

alexander_tsvet
Champ in-the-making
Champ in-the-making
Hi Warper and thanks for the response!

I added another asynchronous step in the end of the sub-process. This does workaround the issue and now the task, which prints the message is only executed once. However, I noticed that the new task is once again executed more times than necessary, which increases the time it takes to execute the entire main process. In the test project I attached, it adds 10 seconds for 8 sub-processes. However, in my real scenario there could be 100 sub-processes, which would mean that more tasks will be retried and as a result it will take more time for the job executor to retry them.

Actually, I just tried to spawn 100 sub-processes and the main process never even finished. This may be due to the fact that Activiti retries a failed job 3 times and then gives up (69 tasks were retried this time). So the workaround you suggested works for small number of sub-processes but not for 100 or more. Should I open a bug in Activiti's JIRA?

Also, does "exclusive" have any effect on this issue? As far as I understood from Activiti's User Guide, "exclusive=true" prevents jobs from a single process instance to execute concurrently. In my situation, there are multiple sub-processes (which I assume are separate process instances), which execute in parallel, but all steps in them are executed sequentially.

Hi Alexander,
I think you may have actually stumbled across an edge case that is not handled properly.
The intended behavior of "exclusive" is (as you already understand) to prevent concurrent execution within a single process instance. It's very useful for parallel joins which can lead to pre-emptive DB lock contention and can also be useful in multi-instance scenarios.

Looking at your model, I see you are already using Exclusive and Async on the Sub Process Call, I doubt this will have any effect since Called Activities (sub Processes) are their own individual process. Async may help a little since the parallel sub processes may not get executed simultaneously, but there is really no guarantee of this.

I agree with your analysis that the retrieval and updating of the nrOfCompletedInstances variable in the leave method of the behavior class should be in a synchronized block. For the majority of scenarios, the sub-process instances will not complete at exactly the same time, so there will be little if any performance impact, however for your scenario, it will resolve the issue.

Can you go ahead and create a Jira and then log the Jira number back into the forum for reference?

Thanks for your patience.

 Regads,

Greg

alexander_tsvet
Champ in-the-making
Champ in-the-making

Hi Greg,

Thanks for the response! I opened a new issue in Activiti's JIRA. Here's the link:

https://activiti.atlassian.net/browse/ACT-4255 

Best regards,

Alexander