Hyland Connect

romanoff · ‎04-13-2011

Hi,

I have written a simple standalone test application (a JUnit test) that starts 5000 Runnable tasks on a j.u.c pool of 50 threads. Each task starts and executes the same simple BPMN process. H2 backend is used. There are no jobs or human tasks created during the execution of this process. Since processes are very simple and make no external invocations, they are supposed to be executed very fast. Main aim of this test is to see how fast Activiti can be if used without jobs, tasks and with disabled history, because I mostly need this mode of operation for my soft real-time use-case.

When I measured the performance of my application I noticed that it is rather slow for some reason. So, I decided to run it under a profiler. Profiler has discovered that a lot of time (>50%) is due to thread contention, because many of the worker threads are waiting for entering a critical section at a few places in the Activiti engine. It indicates that something is oversynchronized. Overall, the (ibatis-based) DB-backend seems to be the bottleneck.

Specifically, the ibatis runtime seems to be oversynchronized in the following two methods:
org.apache.ibatis.datasource.pooled.PooledDataSource.popConnection(String, String)
org.apache.ibatis.datasource.pooled.PooledDataSource.pushConnection(PooledConnection)

They synchronize on rather big blocks of code. And interestingly enough, the thread contention gets even worse when I increase the number of threads in my thread pool.

Another blocker is this method:
org.activiti.engine.impl.db.DbIdGenerator.getNextId()
Here we have:
public synchronized long getNextId() {
    if (lastId<nextId) {
      getNewBlock();
    }
    return nextId++;
}

I'm not an expert, but may be this synchronized block can be replaced by means of using AtomicLong or AtomicInteger?

Questions:
- Are these findings of a generic nature or is it only me, who sees this problem?
- Is there anything obviously wrong with my test, e.g. RuntimeService.startProcessInstanceByKey() is not supposed to be used from many concurrent threads at the same time?
- Is it possible to completely eliminate a need for a DB backend? I already switched the history off, but I still see quite some DB activity.

Best Regards,
Leo

romanoff · ‎04-18-2011

As you can see, it always tries to get the lists of tasks and jobs and then remove them if required. The interesting part of this observation is that I do not use any tasks and jobs at all, but still pay the price for them! I think it should be optimized somehow…
May be it is possible for the Activiti engine to remember that a certain process instance never created any tasks or jobs? If such a boolen flag would exist, one could avoid issuing such useless DB queries. What do you think?

Another place, where useless DB queries are produced is in VariableScopeImpl, which uses DB always to initialize vars from a DB. It creates a lot of DB traffic. And it does it even on process instance creation, where it is known that no variables could be stored in the DB for this instance already. Again, here it could be optimized in a similar way, i.e. if it is a creation of a process instance and no varaibles were defined yet, there is no need to ask the DB for their values.

Yes- I came to the same conclusions a few months ago when I did some profiling.
I believe they can be optimized somehow (eg by keeping a boolean if jobs are created for example).

I have a few related questions and performance improvement suggestions:

1) How does the historyService work? Is it performing asyncrhonous writes to the DB or does it block the "main" thread of the process execution? If it does not do any async writes yet, may be it should? BTW, some log4j appenders actually support this mode of operation to offload the busy worker threads in Java(EE) apps.

2) Imagine the situation, where user tasks (or more generically wait states e.g. waiting for async responses from certain external services) do not last very long and are completed in just a few milliseconds. This can be the case, where users are not humans, but other processes and can react very quickly. Right now, AFAIK, Activiti stores the state of the current execution into the DB and then waits. Once user task is completed (or waiting in a waiting state is over), Acitiviti would restore the process state from a DB and continue the execution. My profiler shows that these save/restore actions introduce quite some overhead and DB access contention if uses under heavy load and with many concurrent process instances being executed. Therefore, I was thinking about the following optimization:
Defer the DB writes for saving a state of process instances for a certain configurable amount of time (e.g. 1 second). If after this timeout, the user task (or wait state) are still waiting and not finished yet, flush those writes to the DB and proceed as usual. But if it finished in the meantime, then there is no need to store this state anymore and all related pending writes can be just ignored. And a restore operation just reuses the RAM versions of the process instance state. As a result, it makes Activiti usable under high load (and even for certain kinds of soft-real time apps).

What do you think about this idea?

P.S. I'm not sure if it introduces any problems in clustered setups due to this deferred writes.

ronald_van_kuij · ‎04-18-2011

I have a few related questions and performance improvement suggestions:

1) How does the historyService work? Is it performing asyncrhonous writes to the DB or does it block the "main" thread of the process execution? If it does not do any async writes yet, may be it should? BTW, some log4j appenders actually support this mode of operation to offload the busy worker threads in Java(EE) apps.

It is not doing async writes. I think one of the reasons is that (at least in our situation) we need them… always… They cannot be lost. So they need to be written to some persistent store in a transaction. Sure something could be implemented that writes them to a local files store (quick) and some other process puts them async in the db. But then you also need some option to still retrieve them if the local storage crashes… etc… much more complicated then it looks at first sight. But if you have other cleaver ideas, let me (us know)

2) Imagine the situation, where user tasks (or more generically wait states e.g. waiting for async responses from certain external services) do not last very long and are completed in just a few milliseconds. This can be the case, where users are not humans, but other processes and can react very quickly. Right now, AFAIK, Activiti stores the state of the current execution into the DB and then waits. Once user task is completed (or waiting in a waiting state is over), Acitiviti would restore the process state from a DB and continue the execution. My profiler shows that these save/restore actions introduce quite some overhead and DB access contention if uses under heavy load and with many concurrent process instances being executed. Therefore, I was thinking about the following optimization:
Defer the DB writes for saving a state of process instances for a certain configurable amount of time (e.g. 1 second). If after this timeout, the user task (or wait state) are still waiting and not finished yet, flush those writes to the DB and proceed as usual. But if it finished in the meantime, then there is no need to store this state anymore and all related pending writes can be just ignored. And a restore operation just reuses the RAM versions of the process instance state. As a result, it makes Activiti usable under high load (and even for certain kinds of soft-real time apps).

Isn't this a kind of second level cache? Would (kind of) work for single node systems, but what if there are multiple nodes and other nodes could execute next steps in the process, so yes it might introduce issues unless you use something like teracotta. And wait states including user tasks (at least in our case) last way, way more than one second, so this would not yield much.

If they are short (as in your case, you might choose not to use waitstates and just use service taks so everything is done in one transaction (at least those are the things we take into account when developing a process) . You might need to increase the number of threads/connections though

romanoff · ‎04-19-2011

I have a few related questions and performance improvement suggestions:

1) How does the historyService work? Is it performing asyncrhonous writes to the DB or does it block the "main" thread of the process execution? If it does not do any async writes yet, may be it should? BTW, some log4j appenders actually support this mode of operation to offload the busy worker threads in Java(EE) apps.

It is not doing async writes. I think one of the reasons is that (at least in our situation) we need them… always… They cannot be lost. So they need to be written to some persistent store in a transaction. Sure something could be implemented that writes them to a local files store (quick) and some other process puts them async in the db. But then you also need some option to still retrieve them if the local storage crashes… etc… much more complicated then it looks at first sight. But if you have other cleaver ideas, let me (us know)

2) Imagine the situation, where user tasks (or more generically wait states e.g. waiting for async responses from certain external services) do not last very long and are completed in just a few milliseconds. This can be the case, where users are not humans, but other processes and can react very quickly. Right now, AFAIK, Activiti stores the state of the current execution into the DB and then waits. Once user task is completed (or waiting in a waiting state is over), Acitiviti would restore the process state from a DB and continue the execution. My profiler shows that these save/restore actions introduce quite some overhead and DB access contention if uses under heavy load and with many concurrent process instances being executed. Therefore, I was thinking about the following optimization:
Defer the DB writes for saving a state of process instances for a certain configurable amount of time (e.g. 1 second). If after this timeout, the user task (or wait state) are still waiting and not finished yet, flush those writes to the DB and proceed as usual. But if it finished in the meantime, then there is no need to store this state anymore and all related pending writes can be just ignored. And a restore operation just reuses the RAM versions of the process instance state. As a result, it makes Activiti usable under high load (and even for certain kinds of soft-real time apps).

Isn't this a kind of second level cache? Would (kind of) work for single node systems, but what if there are multiple nodes and other nodes could execute next steps in the process, so yes it might introduce issues unless you use something like teracotta. And wait states including user tasks (at least in our case) last way, way more than one second, so this would not yield much.

If they are short (as in your case, you might choose not to use waitstates and just use service taks so everything is done in one transaction (at least those are the things we take into account when developing a process) . You might need to increase the number of threads/connections though

Ronald, thanks a lot for your answers!

Few comments:

1)

If they are short (as in your case, you might choose not to use waitstates and just use service taks so everything is done in one transaction (at least those are the things we take into account when developing a process) . You might need to increase the number of threads/connections though

The idea of using service tasks would not scale if we speak about 1000s or 10000s of simultaneous, concurrent process instances, as it implies thread-per-request approach and may also lead to increased threads starvation (all of them just wait for results from external services). The only well-proven way to scale here is to use async approach, i.e. wait states in combination with suspend/resume for process instances.

2) I perfectly understand why Activiti is implemented they way how it is functioning now. It is targeting human-oriented workflows and makes a number of corresponding assumptions about the runtime requirements and behavior of such workflows. The whole business model is currently built around this domain.

But I also want to stress that there is a much bigger (by a few orders of magnitude) world of non-human-oriented workflows (where I come from), e.g. machine-oriented workflows, machine-to-machine, sensors , telcos, trading, etc. This is a huge market with a lot of business opportunities. While BPMN (or PVM) itself is perfectly usable for those domains as well (though "B" becomes not so important eventually), the implementations of these ideas (e.g. Activiti) are not usable for those domains out of the box. The reason for that is that those domains have somewhat different requirements. For example, some of them do not need to maintain history, some others do not need transactions support, third kind does not need the ability to continue process instance executions on a different node in a cluster, because it uses the concept of sticky sessions (and therefore does not need to save the complete process instance state in the DB), etc. And almost all of those have higher, more real-time requirements on performance and latencies simply because machines are much faster than humans 😉

Therefore, I just would like to suggest that in your architecture or system designs you at least consider that some of the hard-coded or built-in decisions are actually based on only a specific use-case from a concrete domain (e.g. CMS or business processes) and are not applicable to the other ones. With that in mind, it would be nice to have such things like persistency, clustering support, asynchrony support, etc more configurable/controlled by policies/pluggable by architectural design, so that alternative implementations could be easily plugged in by Activiti core developers or by project contributors and users. Current implementations of all those features can still be the default implementations delivered out-of-the-box, but they would not prohibit usage of other alternative implementations.

If this becomes possible, then the _same_ engine (and product, and tooling, and solutions around it) would become more usable and efficient in a wide variety of different domains and their combinations, thus making Activiti an even more attractive choice for many new and existing customers and users.

I hope you do not get me wrong. I like Activiti a lot and think it has a great potential. I think overall it is also rather well prepared architecture-wise. I just wanted to highlight some potential opportunities in domains other than pure business processes and architectural implications resulting from the new requirements coming from those domains.

jbarrez · ‎04-20-2011

Not much tie now to go through all the posts - will do at a later point in time (man, you should write books ;-))

Just wanted to point you to this thread, where Tom added a patch for validation: http://forums.activiti.org/en/viewtopic.php?f=6&t=1521&start=10

romanoff · ‎04-20-2011

Not much tie now to go through all the posts - will do at a later point in time (man, you should write books ;-))

Who said I'm not doing it already?

Hyland Connect

Slow multi-threaded execution due to oversynchronized blocks