cancel
Showing results for 
Search instead for 
Did you mean: 

Serious issues with MyBatis connection management

franck102
Champ in-the-making
Champ in-the-making
I have been troubleshooting DB connection issues with Activiti 5.15.1 and after quite a bit of debugging I believe that I have identified at two significant problems.

Problem #1: the default configuration of Activiti causes the job executor to keep all MyBatis connections checked out from the pool forever.

To reproduce, simply bump up the setting of jdbcMaxCheckoutTime to something like 2 hours. This prevents MyBatis from forcibly claiming back connections that clients haven't properly returned to the pool.
With that setting, and some parallel activity (background processes + REST requests), the server quickly freezes up. As far as I can tell the reason is that each JobExecutor is holding on to a MyBatis connection from the pool - forever; that is at least what the debugger tells me.

With the default settings the problem isn't obvious: MyBatis will claim the connections that have been out for more than 20s, and most often this doesn't cause a pb because nothing was happening on the connection (you do get a warning in the logs though).
On a loaded system however, MyBatis will start claiming back connection that are still being used (or about to be used) which triggers random DB request failures.

I haven't been able to figure out why the job executor is not returning connections; but I do know that the ExecuteJobsCommand was the one who initially pulled the connection; and changing the DefaultJobExecutor.maxPoolSize from 10 to 5 results in only 5 MyBatis connections being permanently checked out instead of 5.

DefaultJobExecutor.maxPoolSize  defaults to 10, and so does MyBatis's PooledDataSource.poolMaximumActiveConnections. The consequence is that with the default settings the job executor quickly consumes all available active connections, and MyBatis has to grab them back for every connection request.
The debugger confirms & DEBUG logging confirm this, but maybe I am missing something?
If not maybe the configuration should be changed to poolMaximumActiveConnections = 2 * jobExecutor.maxPoolSize or something similar?

Problem #2:
With 5 executors and 10 MyBatis connections available,  connections are available for script tasks… but I quickly run into a logical deadlock.
The deadlock come from the fact that:
- 5 scripts tasks hold a MyBatis connection, and are waiting to get the lock on DbIdGenerator to proceed (and to release the connections)
- at the same time a thread entered DbIdGenerator (aquiring the lock), and is waiting to get a connection from the pool

This is a classical cross-embrace situation which comes from the fact that the two "locks" (DbIdGenerator monitor and MyBatis connection) are not always taken in the same order. The solution is probably for DbIdGenerator#getNewBlock to make sure it can get a connection before acquiring the java monitor.

I may be missing something obvious… please let me know.
Please note that reverting to the default configuration which allows MyBatis to forcibly claim back unreturned connections is not acceptable - on a system under load that just causes random transaction failures, which would be very hard to handle properly in all situations.

I can provide more details, including thread dumps, on request.
Franck

Franck
6 REPLIES 6

franck102
Champ in-the-making
Champ in-the-making
Thread dump attached.

I have also included (pool-active-conns.txt) the thread dump of the 10 active connections that are not being returned - the dump is taken at the time the connection is acquire from the pool.

franck102
Champ in-the-making
Champ in-the-making
So I just stumbled upon http://jira.codehaus.org/browse/ACT-789 - basically that deadlock was reported in the past, and the recommended "fix" seems to be to use the StrongUuidGenerator.

Is this still the recommended approach? Why isn't that generator the default configured generator if it solves what seems to be a critical defect?

Thanks!
Franck

jbarrez
Star Contributor
Star Contributor
Yes, this is still the recommended approach. See http://activiti.org/userguide/index.html#advanced.uuid.generator

The reason why it is not a default, is because it is less performant. Secondly, not many people use the default connection pool, but their own preference (c3p0, Hikari), or jndi managed. None of our customers use it like that in production, but it is the easiest to get started and running out of the box, with a minimum number of dependencies.

Note that many performance issues are already solved by switching the connection pooling framework. The UUID approach takes care of another (sometimes related) problem: that the load is simply to high for the connection pooling to be able to follow when needing a new batch of ids.

franck102
Champ in-the-making
Champ in-the-making
I had missed the StrongUuidGenerator being mentioned in the docs, thanks! Unfortunately:

1. Replacing the connection pool implementation will *not* make any difference with respect to connections not being returned to the pool. The pool only option in that situation are to forcibly claim back the connection, which will cause random errors if the connection is subsequently used. If that is confirmed to happen it has to be fixed in Activiti.

2. The default implementation of the ID generator is subject to deadlock because it tries to acquire a connection after locking the object, while other threads do the exact opposite. It will only work if the connection pool always has at least a free idle connection - as soon as all connections are out and trying to generate IDs the deadlock will happen.

I don't see that a different connection pool implementation would make any difference on these two fronts, unless I am missing something? With any pool any DB or network hiccup that causes transactions to be held back for a few seconds will cause the pool to quickly run out of idle connections.

Franck

trademak
Star Contributor
Star Contributor
Hi Franck,

Switching the connection pool sometimes help to optimise the number of connections and the idle time checks etc. I don't think the job executor keeps the connections checkout out of the pool. But it will poll the database every couple of seconds and execute outstanding jobs, so it might need a connection "all" the time. You can configure the number of threads that are used in the job executor and determine the correct level of connections needed in the connection pool based on that.

Best regards,

franck102
Champ in-the-making
Champ in-the-making
Thanks Tijs, using the StrongUuidGenerator solved the deadlock problem for us.

I haven't been able to determine conclusively if or when connections fail to be returned, I will run with a high poolMaximumCheckoutTime for a while and I'll report back if I find a problem.

The polling is unlikely to be an issue, it is true that the MyBatis pool uses notifyAll and thus attributes available connection to waiting clients at random, but it would be very unlucky for a waiting client to lose every time and never get his connection…

Franck