Windows platform, Tomcat 7, MSSQL 2008/2012, Activiti 5.12
On our production system we cannot start our application when there are several (~10+) jobs in ACT_RU_JOB that has a DUEDATE_ that has passed. The application/activiti starts puts a lock on 15 rows, starts polling the jobs and then stalls after a while. Never continues, doesn't poll anymore (logging level is set to ALL). If I deactivate the jobexecutor the application starts just fine.
I have recreated the scenario in a unittest on my dev machine. A process with an intermediate timer that triggers every 24h that calls a servicetask that does nothing but a log-statement.
Then I start 30 instances of this process from a unit-test. Then I set the DUEDATE_ -1 day and start my application which then starts triggering the jobs that are now active. I get a bunch of deadlock-messages from the jbdc-driver when Activiti tries to delete an execution or even do a select. See attached stacktrace.txt
Same thing happens if I set the DUEDATE_ back while the application is running. I assume something similar is happening on the production system although the symptoms are not entirely the same.
If I do the same thing on Oracle, everything executes just fine. I think this has to do with MSSQL's transaction system which afaik is more strict that Oracle. If you first write inside a transaction, and then reads the same row in that transaction, the server assumes this is a dirty read and throws a deadlock exception.
I tried setting connection pool to max 1, min 1. This resulted in the application hanging exactly like in the production-environment.
I guess it could be the connection pool, although I'm already using C3P0 and not DBCP.
I can then also semi conclude that the database in the production system only has one or very few connections awailable, since I get the same behaviour when only giving the connection pool 1 connection.
Before MSSQL would throw deadlocks on as little as 15 jobs with a passed duedate and with these indexes I can run 3000 jobs with no problem (other than a primary key violation on the activity table, but thats a different story).