I have been troubleshooting DB connection issues with Activiti 5.15.1 and after quite a bit of debugging I believe that I have identified at two significant problems.
Problem #1: the default configuration of Activiti causes the job executor to keep all MyBatis connections checked out from the pool forever.
To reproduce, simply bump up the setting of jdbcMaxCheckoutTime to something like 2 hours. This prevents MyBatis from forcibly claiming back connections that clients haven't properly returned to the pool.
With that setting, and some parallel activity (background processes + REST requests), the server quickly freezes up. As far as I can tell the reason is that each JobExecutor is holding on to a MyBatis connection from the pool - forever; that is at least what the debugger tells me.
With the default settings the problem isn't obvious: MyBatis will claim the connections that have been out for more than 20s, and most often this doesn't cause a pb because nothing was happening on the connection (you do get a warning in the logs though).
On a loaded system however, MyBatis will start claiming back connection that are still being used (or about to be used) which triggers random DB request failures.
I haven't been able to figure out why the job executor is not returning connections; but I do know that the ExecuteJobsCommand was the one who initially pulled the connection; and changing the DefaultJobExecutor.maxPoolSize from 10 to 5 results in only 5 MyBatis connections being permanently checked out instead of 5.
DefaultJobExecutor.maxPoolSize defaults to 10, and so does MyBatis's PooledDataSource.poolMaximumActiveConnections. The consequence is that with the default settings the job executor quickly consumes all available active connections, and MyBatis has to grab them back for every connection request.
The debugger confirms & DEBUG logging confirm this, but maybe I am missing something?
If not maybe the configuration should be changed to poolMaximumActiveConnections = 2 * jobExecutor.maxPoolSize or something similar?
Problem #2:
With 5 executors and 10 MyBatis connections available, connections are available for script tasks… but I quickly run into a logical deadlock.
The deadlock come from the fact that:
- 5 scripts tasks hold a MyBatis connection, and are waiting to get the lock on DbIdGenerator to proceed (and to release the connections)
- at the same time a thread entered DbIdGenerator (aquiring the lock), and is waiting to get a connection from the pool
This is a classical cross-embrace situation which comes from the fact that the two "locks" (DbIdGenerator monitor and MyBatis connection) are not always taken in the same order. The solution is probably for DbIdGenerator#getNewBlock to make sure it can get a connection before acquiring the java monitor.
I may be missing something obvious… please let me know.
Please note that reverting to the default configuration which allows MyBatis to forcibly claim back unreturned connections is not acceptable - on a system under load that just causes random transaction failures, which would be very hard to handle properly in all situations.
I can provide more details, including thread dumps, on request.
Franck
Franck