cancel
Showing results for 
Search instead for 
Did you mean: 

JobExecutor deadlocks on MSSQL

andreasa
Champ in-the-making
Champ in-the-making
Windows platform, Tomcat 7, MSSQL 2008/2012, Activiti 5.12

On our production system we cannot start our application when there are several (~10+) jobs in ACT_RU_JOB that has a DUEDATE_ that has passed. The application/activiti starts puts a lock on 15 rows, starts polling the jobs and then stalls after a while. Never continues, doesn't poll anymore (logging level is set to ALL). If I deactivate the jobexecutor the application starts just fine.

I have recreated the scenario in a unittest on my dev machine. A process with an intermediate timer that triggers every 24h that calls a servicetask that does nothing but a log-statement.

Then I start 30 instances of this process from a unit-test. Then I set the DUEDATE_ -1 day and start my application which then starts triggering the jobs that are now active. I get a bunch of deadlock-messages from the jbdc-driver when Activiti tries to delete an execution or even do a select. See attached stacktrace.txt

Same thing happens if I set the DUEDATE_ back while the application is running.
I assume something similar is happening on the production system although the symptoms are not entirely the same.

If I do the same thing on Oracle, everything executes just fine.
I think this has to do with MSSQL's transaction system which afaik is more strict that Oracle. If you first write inside a transaction, and then reads the same row in that transaction, the server assumes this is a dirty read and throws a deadlock exception.

I am aware of this http://forums.activiti.org/content/jobexecutor-does-not-cope-simultaneousness thread, but it doesn't really offer any solutions.
3 REPLIES 3

andreasa
Champ in-the-making
Champ in-the-making
Inspired by this: http://forums.activiti.org/content/activiti-thread-safety

I tried setting connection pool to max 1, min 1. This resulted in the application hanging exactly like in the production-environment.

I guess it could be the connection pool, although I'm already using C3P0 and not DBCP.

I can then also semi conclude that the database in the production system only has one or very few connections awailable, since I get the same behaviour when only giving the connection pool 1 connection.

andreasa
Champ in-the-making
Champ in-the-making
This can be solved by creating the indexes that long suggests in this thread: http://forums.activiti.org/content/deadlock-load-test-db2

Before MSSQL would throw deadlocks on as little as 15 jobs with a passed duedate and with these indexes I can run 3000 jobs with no problem (other than a primary key violation on the activity table, but thats a different story).

andreasa
Champ in-the-making
Champ in-the-making
It is also suggested in http://forums.activiti.org/content/deadlock-mssql-2008-while-loadtesting that upgrading to 5.13 might resolve this. Haven't got time to test that right now though.