Hyland Connect

serid · ‎10-26-2016

Hi All,

I noticed that if I have an async (exclusive) task (service task) which runs for a long time activiti tries to run it several times.
So I tried to see what is happening and these are my guesses:

1) When a task is defined as async and run, it adds records to act_ru_execution and act_ru_job tables
2) The act_ru_execution table's entry has a LOCK_TIME_ populated which seems to be 5 mins ahead of the start of the task
3) The act_ru_job table's entry has DUEDATE_ populated with the exact time as LOCK_TIME_
4) When that specified in both table's time is passing, it tries to run the job again (while the job which was first time triggered is still running)
5) I have my own logic that does not allow to run a task which is already running and throws exception

So my understanding is that when a task is defined as async and is run, it adds an entry in act_ru_execution and act_ru_job.
Then a thread from a pool picks it up and "locks" it by updating LOCK_TIME_ and DUEDATE_ ?
And if the task does not complete in that time it releases the lock (update again) so another thread can pick it up and start again ?

Could someone clarify this for me please ?
Why such behaviour - I expect it to be the right behaviour but I would like to know the reasons behind it ?
I also found out this param:

asyncExecutorAsyncJobLockTimeInMillis: The amount of time (in milliseconds) an async job is locked when acquired by the async executor. During this period of time, no other async executor will try to acquire and lock this job. And behold the default is 5 min!

Basically I want to have the possibility to run either non async and async tasks only once until they finish / throw exception.

Many thanks,
Adrian

warper · ‎10-26-2016

Hi Adrian!

>when a task is defined as async and is run, it adds an entry in act_ru_execution and act_ru_job.
Yes, but if you use asyncExecutor, it also puts this job into memory queue to reduce DB queries count for next step.

>Then a thread from a pool picks it up and "locks" it by updating LOCK_TIME_ and DUEDATE_ ?
Well, nearly… May be not thread from pool (I belive it's asyncExecutor main thread). And LOCK_TIME_ is set to current time, not to lock end. Also if it's exclusive job, process instance is locked similar to job lock. Note that here lock transaction is commited, that is update to lock time is put into DB immediately.
In case of several executors there are possible "normal" locking exceptions when they concurrently try to commit updated lock time. Winner locks job (and process), losers fall back and do other jobs.

>And if the task does not complete in that time it releases the lock (update again) so another thread can pick it up and start again ?
Well, nearly… Executor checks for LOCK_TIME_+asyncExecutorAsyncJobLockTimeInMillis comparing it to current time. If job is fetched from DB and got to execution, it's relocked, LOCK_TIME_ is undated.

>Why such behaviour
I'm not activiti developer, but logic is quite clear. Different activiti engines (in cluster, for example) do not communicate to each other, they can only synchronize through DB. It's possible to make sophisticated logic behind DB facade, but it's not transparent. So simple yet effective mechanics was used: commit lock time and check lock time + timeout.
There is no pings between engines that can tell other engines that current executor is still working on it. If your node takes job and loses intranet, power supply or simply sleeps for ages in web call, other nodes can still get this job… later.

>Basically I want to have the possibility to run either non async and async tasks only once until they finish / throw exception.
You can set asyncExecutorAsyncJobLockTimeInMillis and asyncExecutorTimedJobLockTimeInMillis to really high values (several days) and it will never happen again. You'll have to deal with jobs that were locked but never released though (for example, jobs running during server restart).

serid · ‎10-26-2016

Hi,

Many thanks for your reply.

"Yes, but if you use asyncExecutor, it also puts this job into memory queue to reduce DB queries count for next step."

Yes - that is a good point and I am glad it does so.

"Well, nearly… May be not thread from pool (I belive it's asyncExecutor main thread). And LOCK_TIME_ is set to current time, not to lock end. Also if it's exclusive job, process instance is locked similar to job lock. Note that here lock transaction is commited, that is update to lock time is put into DB immediately."

Yes - I agree it is the asyncExecutor thread but I have double checked the time on updates and it is not the current time. It already adds the 5 min to the current time and updates the record in DB. At least using version 5.21.0. As for locking the process instance - I have not checked it at all - will do shortly.

"In case of several executors there are possible "normal" locking exceptions when they concurrently try to commit updated lock time. Winner locks job (and process), losers fall back and do other jobs."

That would make sense.

"Well, nearly… Executor checks for LOCK_TIME_+asyncExecutorAsyncJobLockTimeInMillis comparing it to current time. If job is fetched from DB and got to execution, it's relocked, LOCK_TIME_ is undated."

Well - I would understand to relock the job and LOCK_TIME_ is updated. But why to start the same task from scratch if it is already running ?

"I'm not activiti developer, but logic is quite clear. Different activiti engines (in cluster, for example) do not communicate to each other, they can only synchronize through DB. It's possible to make sophisticated logic behind DB facade, but it's not transparent. So simple yet effective mechanics was used: commit lock time and check lock time + timeout.
There is no pings between engines that can tell other engines that current executor is still working on it. If your node takes job and loses intranet, power supply or simply sleeps for ages in web call, other nodes can still get this job… later."

My earlier question as to why to start again even if it is still running might be answered by your reply if it is indeed the reasons behind it (eg several engines running not knowing anything about each other). Still doesn't sound to me like a perfect solution but that might be the trade off to achieve the goal without complicating things …

Many thanks for your reply - much appreciated !
Still keen to hear any other explanations - if there are any

Thanks,
Adrian

serid · ‎10-26-2016

Hmmm ….

Just checked my main application class (which is Spring Boot) and I have:

<code>
@Bean(name="activitiAsyncJobExecutor")
public DefaultAsyncJobExecutor activitiAsyncJobExecutor() {
DefaultAsyncJobExecutor bean = new DefaultAsyncJobExecutor();
bean.setCorePoolSize(10);
bean.setMaxPoolSize(100);
bean.setKeepAliveTime(5000);
bean.setQueueSize(100);
bean.setMaxTimerJobsPerAcquisition(30);
bean.setMaxAsyncJobsDuePerAcquisition(30);
bean.setDefaultAsyncJobAcquireWaitTimeInMillis(1000);
bean.setDefaultTimerJobAcquireWaitTimeInMillis(1000);
bean.setTimerLockTimeInMillis(1000 * 60 * 60 * 12); // 12 hours
bean.setAsyncJobLockTimeInMillis(1000 * 60 * 60 * 12); // 12 hours

return bean;
}

@Bean
@DependsOn(value="applicationContextProvider")
public SpringProcessEngineConfiguration activitiConfigurationBean() {
SpringProcessEngineConfiguration bean = new SpringProcessEngineConfiguration();
bean.setJobExecutorActivate(false);
bean.setAsyncExecutorActivate(true);
bean.setAsyncExecutorEnabled(true);
bean.setDataSource(dataSource());
bean.setTransactionManager(transactionManager());
bean.setAsyncExecutor(activitiAsyncJobExecutor());
bean.setDatabaseSchemaUpdate("true");
bean.setDeploymentResources(getActivitiResources());

return bean;
}
</code>

which should have not been the 5 mins I am seeing ?

serid · ‎10-26-2016

Hi,

You were right - it is the current time ?!
No idea how I was seeing other time in those tables …… 😕

Sorry for confusion.

serid · ‎10-26-2016

Ok - I think I am doing something wrong …

I have recompiled and rebuild the project and with those time settings (12 hours) it does work now …
Maybe I should slow down a bit

Thanks very much for your help!

Hyland Connect

Async service task is tried to be run several times