cancel
Showing results for 
Search instead for 
Did you mean: 

Query Scalability

brianshowers
Champ in-the-making
Champ in-the-making
I'm currently in evaluation mode to decide if Activiti is the right BPM solution for our project.  From an API perspective, everything looks great so far.  The APIs are intuitive, and the Spring integration is simple.  However, I've noticed a problem with respect to scalability that is concerning.  I'm hoping that I just have something configured poorly.  I've performed the test with two different database servers and I've received similar results with both.

Setups
Tomcat, MySQL 5.5
Tomcat, PostgreSQL 9.1

Test Setup
1. Create 100,000 process instances for a process definition that contains a user task
2. Repeatedly query for the first 10 unassigned tasks ordered by priority and due date
3. Create 900,000 additional process instances (1MM total)
4. Repeatedly query (50 times) for the first 10 unassigned tasks ordered by priority and due date

The problem is that the task queries appear to load the full task list into memory before limiting the set down to the first 10.  This manifests itself in two ways.  First, the query time jumps from ~1 sec to ~11 sec when the number of rows increases.  Second, memory is severly impacted.  With 100,000 tasks, it take ~120MB of heap space to find the 10 tasks.  While this might be acceptable for a system with a single user, it won't scale at all to a system that has hundreds of concurrent users.  When the number of tasks grows to 1MM, it takes ~800MB of ram to complete a query.  This is unacceptable even for a system with a single user.

In my tests, the application server and the database server are on the same machine.  I suspect that if I moved to a more production equivalent setup, the query times would get far worse due to the network latency of moving so much data across the wire.

For reference, I'm querying for the tasks with the following code:
taskService.createTaskQuery().taskUnnassigned().orderByTaskPriority().desc().orderByDueDate().asc().listPage(0, num);

I've done some tracing through the activiti library, and it looks like it is using the Mybatis RowBounds construct to manage pagination at the JDBC driver level rather than in the SQL via LIMIT/OFFSET.  Is there something specific that I need to add into the DataSource configuration that will make these queries more performant?  Or is there someway to configure the underlying Mybatis library so that it will translate a RowBounds object into actual SQL fragments?

Thanks!
5 REPLIES 5

heymjo
Champ on-the-rise
Champ on-the-rise
Interesting observation ! JDBC pagination is of course pure evil for the numbers you're running. Most database engines support  LIMIT/OFFSET syntax that can easily be added to existing queries transparently so i'm a bit surprised MyBatis is not handing this translation natively.

In DbSqlSession there is this bit of code which seems to indicate that this problem has been thought of:


    if(databaseType.equals("mssql") || databaseType.equals("db2")) {
      // use mybatis paging (native database paging not yet implemented)
      loadedObjects = sqlSession.selectList(statement, parameter, new RowBounds(parameter.getFirstResult(), parameter.getMaxResults()));
    } else {
      // use native database paging
      loadedObjects = sqlSession.selectList(statement, parameter);
    }

But following this, native paging should work for pgsql and mysql. Surely one of the devs will clarify things up Smiley Happy

Jorg

ronald_van_kuij
Champ on-the-rise
Champ on-the-rise
first of all, great analysis and post. I often see them much, much worse.

In the myBatis docs it says:

Different drivers are able to achieve different levels of efficiency in this regard. For the best performance, use result set types of SCROLL_SENSITIVE or SCROLL_INSENSITIVE (in other words: not FORWARD_ONLY).

Which might make a difference… But it would indeed be strange if it did…

In DbSQLSessionFactory there is

  @SuppressWarnings("unchecked")
  protected List filterLoadedObjects(List<Object> loadedObjects) {
    if (loadedObjects.isEmpty()) {
      return loadedObjects;
    }
    if (! (PersistentObject.class.isAssignableFrom(loadedObjects.get(0).getClass()))) {
      return loadedObjects;
    }
    List<PersistentObject> filteredObjects = new ArrayList<PersistentObject>(loadedObjects.size());
    for (Object loadedObject: loadedObjects) {
      PersistentObject cachedPersistentObject = cacheFilter((PersistentObject) loadedObject);
      filteredObjects.add(cachedPersistentObject);
    }
    return filteredObjects;
  }

That is also used in the selectList which might also influence things…

Yet, indeed the core devs should have a more detailed look at this as I do not have the time for now.

brianshowers
Champ in-the-making
Champ in-the-making
Thanks for the replies.  I should have mentioned that I did my analysis using the latest official release of Activiti (v5.9).  It looks to me like the code that @heymjo pointed out should in fact fix this issue, but it is currently just in the trunk codebase.  Digging through the svn logs shows that the DbSqlSession snippet was added in ACT-1209 to fix this very problem.

As a user of MyBatis myself, I feel the pain.  I've made the exact same mistake of assuming that MyBatis will do native SQL pagination when you pass a RowBounds when, in fact, it always just uses JDBC pagination (which, as pointed out, is for various reasons, evil).

I guess I'll just have to wait for 5.10 to be released before we can really consider integrating Activiti.  As a noob question, does the Activiti release cycle ever cut any beta releases prior to the official release?  Or do I have to choose between waiting for the official 5.10 release (I think it's currently not scheduled until August) or building my own copy off of the trunk bleeding edge?

ronald_van_kuij
Champ on-the-rise
Champ on-the-rise
No, there are no RC cycles… The trunk is usually very stable unless you use features marked experimental. You should be safe to test this (me being curious if it solves your ' problem')

jbarrez
Star Contributor
Star Contributor
Actually Daniel came to the same conclusion a while ago. Should indeed be fixed.