Query Scalability

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2012 06:22 PM
I'm currently in evaluation mode to decide if Activiti is the right BPM solution for our project. From an API perspective, everything looks great so far. The APIs are intuitive, and the Spring integration is simple. However, I've noticed a problem with respect to scalability that is concerning. I'm hoping that I just have something configured poorly. I've performed the test with two different database servers and I've received similar results with both.
Setups
Tomcat, MySQL 5.5
Tomcat, PostgreSQL 9.1
Test Setup
1. Create 100,000 process instances for a process definition that contains a user task
2. Repeatedly query for the first 10 unassigned tasks ordered by priority and due date
3. Create 900,000 additional process instances (1MM total)
4. Repeatedly query (50 times) for the first 10 unassigned tasks ordered by priority and due date
The problem is that the task queries appear to load the full task list into memory before limiting the set down to the first 10. This manifests itself in two ways. First, the query time jumps from ~1 sec to ~11 sec when the number of rows increases. Second, memory is severly impacted. With 100,000 tasks, it take ~120MB of heap space to find the 10 tasks. While this might be acceptable for a system with a single user, it won't scale at all to a system that has hundreds of concurrent users. When the number of tasks grows to 1MM, it takes ~800MB of ram to complete a query. This is unacceptable even for a system with a single user.
In my tests, the application server and the database server are on the same machine. I suspect that if I moved to a more production equivalent setup, the query times would get far worse due to the network latency of moving so much data across the wire.
For reference, I'm querying for the tasks with the following code:
I've done some tracing through the activiti library, and it looks like it is using the Mybatis RowBounds construct to manage pagination at the JDBC driver level rather than in the SQL via LIMIT/OFFSET. Is there something specific that I need to add into the DataSource configuration that will make these queries more performant? Or is there someway to configure the underlying Mybatis library so that it will translate a RowBounds object into actual SQL fragments?
Thanks!
Setups
Tomcat, MySQL 5.5
Tomcat, PostgreSQL 9.1
Test Setup
1. Create 100,000 process instances for a process definition that contains a user task
2. Repeatedly query for the first 10 unassigned tasks ordered by priority and due date
3. Create 900,000 additional process instances (1MM total)
4. Repeatedly query (50 times) for the first 10 unassigned tasks ordered by priority and due date
The problem is that the task queries appear to load the full task list into memory before limiting the set down to the first 10. This manifests itself in two ways. First, the query time jumps from ~1 sec to ~11 sec when the number of rows increases. Second, memory is severly impacted. With 100,000 tasks, it take ~120MB of heap space to find the 10 tasks. While this might be acceptable for a system with a single user, it won't scale at all to a system that has hundreds of concurrent users. When the number of tasks grows to 1MM, it takes ~800MB of ram to complete a query. This is unacceptable even for a system with a single user.
In my tests, the application server and the database server are on the same machine. I suspect that if I moved to a more production equivalent setup, the query times would get far worse due to the network latency of moving so much data across the wire.
For reference, I'm querying for the tasks with the following code:
taskService.createTaskQuery().taskUnnassigned().orderByTaskPriority().desc().orderByDueDate().asc().listPage(0, num);
I've done some tracing through the activiti library, and it looks like it is using the Mybatis RowBounds construct to manage pagination at the JDBC driver level rather than in the SQL via LIMIT/OFFSET. Is there something specific that I need to add into the DataSource configuration that will make these queries more performant? Or is there someway to configure the underlying Mybatis library so that it will translate a RowBounds object into actual SQL fragments?
Thanks!
Labels:
- Labels:
-
Archive
5 REPLIES 5

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2012 12:53 AM
Interesting observation ! JDBC pagination is of course pure evil for the numbers you're running. Most database engines support LIMIT/OFFSET syntax that can easily be added to existing queries transparently so i'm a bit surprised MyBatis is not handing this translation natively.
In DbSqlSession there is this bit of code which seems to indicate that this problem has been thought of:
But following this, native paging should work for pgsql and mysql. Surely one of the devs will clarify things up
Jorg
In DbSqlSession there is this bit of code which seems to indicate that this problem has been thought of:
if(databaseType.equals("mssql") || databaseType.equals("db2")) {
// use mybatis paging (native database paging not yet implemented)
loadedObjects = sqlSession.selectList(statement, parameter, new RowBounds(parameter.getFirstResult(), parameter.getMaxResults()));
} else {
// use native database paging
loadedObjects = sqlSession.selectList(statement, parameter);
}
But following this, native paging should work for pgsql and mysql. Surely one of the devs will clarify things up

Jorg
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2012 05:03 AM
first of all, great analysis and post. I often see them much, much worse.
In the myBatis docs it says:
Which might make a difference… But it would indeed be strange if it did…
In DbSQLSessionFactory there is
That is also used in the selectList which might also influence things…
Yet, indeed the core devs should have a more detailed look at this as I do not have the time for now.
In the myBatis docs it says:
Different drivers are able to achieve different levels of efficiency in this regard. For the best performance, use result set types of SCROLL_SENSITIVE or SCROLL_INSENSITIVE (in other words: not FORWARD_ONLY).
Which might make a difference… But it would indeed be strange if it did…
In DbSQLSessionFactory there is
@SuppressWarnings("unchecked")
protected List filterLoadedObjects(List<Object> loadedObjects) {
if (loadedObjects.isEmpty()) {
return loadedObjects;
}
if (! (PersistentObject.class.isAssignableFrom(loadedObjects.get(0).getClass()))) {
return loadedObjects;
}
List<PersistentObject> filteredObjects = new ArrayList<PersistentObject>(loadedObjects.size());
for (Object loadedObject: loadedObjects) {
PersistentObject cachedPersistentObject = cacheFilter((PersistentObject) loadedObject);
filteredObjects.add(cachedPersistentObject);
}
return filteredObjects;
}
That is also used in the selectList which might also influence things…
Yet, indeed the core devs should have a more detailed look at this as I do not have the time for now.

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2012 02:43 PM
Thanks for the replies. I should have mentioned that I did my analysis using the latest official release of Activiti (v5.9). It looks to me like the code that @heymjo pointed out should in fact fix this issue, but it is currently just in the trunk codebase. Digging through the svn logs shows that the DbSqlSession snippet was added in ACT-1209 to fix this very problem.
As a user of MyBatis myself, I feel the pain. I've made the exact same mistake of assuming that MyBatis will do native SQL pagination when you pass a RowBounds when, in fact, it always just uses JDBC pagination (which, as pointed out, is for various reasons, evil).
I guess I'll just have to wait for 5.10 to be released before we can really consider integrating Activiti. As a noob question, does the Activiti release cycle ever cut any beta releases prior to the official release? Or do I have to choose between waiting for the official 5.10 release (I think it's currently not scheduled until August) or building my own copy off of the trunk bleeding edge?
As a user of MyBatis myself, I feel the pain. I've made the exact same mistake of assuming that MyBatis will do native SQL pagination when you pass a RowBounds when, in fact, it always just uses JDBC pagination (which, as pointed out, is for various reasons, evil).
I guess I'll just have to wait for 5.10 to be released before we can really consider integrating Activiti. As a noob question, does the Activiti release cycle ever cut any beta releases prior to the official release? Or do I have to choose between waiting for the official 5.10 release (I think it's currently not scheduled until August) or building my own copy off of the trunk bleeding edge?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2012 02:20 AM
No, there are no RC cycles… The trunk is usually very stable unless you use features marked experimental. You should be safe to test this (me being curious if it solves your ' problem')
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2012 10:33 AM
Actually Daniel came to the same conclusion a while ago. Should indeed be fixed.
