cancel
Showing results for 
Search instead for 
Did you mean: 

Race condition in job executor?

mandas
Champ in-the-making
Champ in-the-making
Hello all,

We had the following case, an async and exclusive service task calling a cdi method. The job executor is configured to lock jobs for 5 mins so as the jboss server transaction timeout. It has been noticed that this cdi method, under heavy load, had its transaction timed out so nothing was updated in ACT_RU_JOB (neither retries, nor lock time). Then another job executor thread was picking up the job and so on…To investigate our use case, we toggled a breakpoint inside the method to pause the execution, and then we went at ACT_RU_JOB and set lock time in the past. Then we had two threads trying the same task.

As waittimeinmillis and transaction timeout were equal, we assumed that maybe we had a race condition, with a possible workaround to increase waittimeinmillis to avoid it. (lock time of the job exceeded millis before the transaction timeout, and a second job executor picks up the job, so as the two invocations co-exist for a small amount of time)

But afterwards, I re-read about exclusive jobs, and I 'm wondering how we can end up with two job executor threads trying the same job, provided that exclusive guarantees that this job cannot run at the same time with another one (including itself?) of the same process instance.

Can someone clarify? (activiti-5.11)

Thanks,
Dimitris
1 REPLY 1

frederikherema1
Star Contributor
Star Contributor
When a job is locked for XX seconds and not unlocked, the job-executor assumes something went terribly wrong while executing. THis has noting to do with transaction-timeout or millisToWait (as this is the poll-interval for acquiring jobs ready to execute).

See the setting for lockTimeInMillis (protected int lockTimeInMillis = 5 * 60 * 1000Smiley Wink which defaults to 5 minutes. So if the first thread in the pool acquired the job and is executing it (lock-time is set) for longer than 5 minutes, the job-executor will use another thread to execute the job, effectively setting the lock-time etc. again.

This results in the first thread, when it returns, to get an ActivitiConcurrentModificationException when the transaction is committed, rolling back the first job-execution. That's the way it's supposed to work. You should make sure the lockTimeInMillis is larger than the transaction-timeout to ensure the job being "marked as executed/failed" due to transaction-timeout before it's considered as "stuck"…
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.