cancel
Showing results for 
Search instead for 
Did you mean: 

Null Issues When Under Heavy Throughput

rallen1
Confirmed Champ
Confirmed Champ
Hello,

I am running Activiti under a pretty heavy throughput load and noticing some issues pop up. Unfortunately, I could not come up with a unit test to nicely replicate the behaviour. The BPMNs I am working with are private and are rather large and complicated.

Some things to note about what I am using:

  • 5.17.0
  • AsyncExecutor
  • StrongUuidGenerator
[h2]Issues:[/h2]

  1. I am non-deterministically observing:


  2. 2015-06-03/14:11:43.546/PDT ERROR org.activiti.engine.impl.asyncexecutor.ExecuteAsyncRunnable - Job d51bf18d-da9b-4e1f-8fb1-7700af745b57 failed
    java.lang.NullPointerException: null
       at org.activiti.engine.impl.pvm.runtime.AtomicOperationTransitionCreateScope.execute(AtomicOperationTransitionCreateScope.java:35) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.interceptor.CommandContext.performOperation(CommandContext.java:96) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.jobexecutor.AsyncContinuationJobHandler.execute(AsyncContinuationJobHandler.java:35) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.persistence.entity.JobEntity.execute(JobEntity.java:84) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.persistence.entity.MessageEntity.execute(MessageEntity.java:29) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.cmd.ExecuteAsyncJobCmd.execute(ExecuteAsyncJobCmd.java:52) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.interceptor.CommandInvoker.execute(CommandInvoker.java:24) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:57) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.spring.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:47) ~[activiti-spring-5.17.0.jar:5.17.0]
       at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133) ~[spring-tx-4.1.4.RELEASE.jar:4.1.4.RELEASE]
       at org.activiti.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:45) ~[activiti-spring-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35) ~[activiti-engine-5.17.0.jar:5.17.0]
       at org.activiti.engine.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:73) ~[activiti-engine-5.17.0.jar:5.17.0]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_45]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_45]
       at java.lang.Thread.run(Unknown Source) [na:1.8.0_45]


    This seems related to https://github.com/Activiti/Activiti/blob/master/modules/activiti-engine/src/main/java/org/activiti/..., and the possibility that
    execution.getActivity();
    can return a null under heavy load or because of some concurrency error I may be unknowingly producing.

     
  3. My log is full of the following type of entry:
  4. 2015-06-03/15:24:46.863/PDT ERROR org.activiti.engine.impl.cmd.JobRetryCmd - activitiy or FailedJobRetryTimerCycleValue is null in job b220b51a-a376-4f0d-9628-17b4c801deee'. only decrementing retries.

    I realize this has been changed to log to
    info
    in the latest source, but I am curious why it happens and if there is anything I am doing to cause it.
As always, any help is greatly appreciated. Thank you.
4 REPLIES 4

trademak
Star Contributor
Star Contributor
We are not aware of any concurrency issues in that area. Can you debug this a bit more and see what kind of execution this is failing for and what is stored in the database for this specific execution?

Best regards,

rallen1
Confirmed Champ
Confirmed Champ
Hey Tijs,

Thanks for responding. I think the issue may be related to having a process instance that spawns many child process instances.

I am not sure which table to look into to find the correct job. In regards to the multitude of <code>activitiy or FailedJobRetryTimerCycleValue is null</code> messages, if I do a grep on a job execution in question, I get something like this:

<code>
dev/log/test.log.4:11:59:02,648 [pool-1-thread-2] DEBUG org.activiti.engine.impl.cmd.ExecuteAsyncJobCmd  - Executing async job d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a
dev/log/test.log.4:11:59:02,531 [pool-1-thread-2] DEBUG org.activiti.engine.impl.cmd.LockExclusiveJobCmd  - Executing lock exclusive job d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a d7e16b48-0ed7-11e5-bd3b-3417ebc8228a
dev/log/test.log.3:11:59:26,315 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  -   delete JobEntity [id=d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a]
dev/log/test.log.2:11:59:38,931 [pool-1-thread-2] DEBUG org.activiti.engine.impl.cmd.UnlockExclusiveJobCmd  - Unlocking exclusive job d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a
dev/log/test.log.2:11:59:38,931 [pool-1-thread-2] DEBUG org.activiti.engine.impl.asyncexecutor.ExecuteAsyncRunnable  - Optimistic locking exception during job execution. If you have multiple async executors running against the same database, this exception means that this thread tried to acquire an exclusive job, which already was changed by another async executor thread.This is expected behavior in a clustered environment. You can ignore this message if you indeed have multiple job executor threads running against the same database. Exception message: JobEntity [id=d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a] was updated by another transaction concurrently
dev/log/test.log.2:11:59:38,930 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.JobEntity.updateMessage  - ==> Parameters: 3(Integer), null, null, 2(Integer), ada5bc01-0ed9-11e5-bd3b-3417ebc8228a(String), JobEntity [id=d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a] was updated by another transaction concurrently(String), 2015-06-09 11:59:48.868(Timestamp), d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a(String), 2(Integer)
dev/log/test.log.2:11:59:38,930 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  - updating: JobEntity [id=d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a]
dev/log/test.log.2:11:59:38,868 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  -   update JobEntity [id=d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a]
dev/log/test.log.2:11:59:38,868 [pool-1-thread-2] ERROR org.activiti.engine.impl.cmd.JobRetryCmd  - activitiy or FailedJobRetryTimerCycleValue is null in job d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a'. only decrementing retries.
dev/log/test.log.2:11:59:38,805 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.JobEntity.selectJob  - ==> Parameters: d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a(String)
dev/log/test.log.2:11:59:38,634 [pool-1-thread-2] DEBUG org.activiti.engine.impl.interceptor.CommandContext  - Optimistic locking exception : org.activiti.engine.ActivitiOptimisticLockingException: JobEntity [id=d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a] was updated by another transaction concurrently
dev/log/test.log.2:11:59:38,587 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.JobEntity.deleteMessage  - ==> Parameters: d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a(String), 1(Integer)
dev/log/test.log.2:11:59:38,587 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  - executing: delete JobEntity [id=d7e16b4e-0ed7-11e5-bd3b-3417ebc8228a]
</code>

Does anything here jump out at you as being wrong? I have history set to full. Is there a database table in the history that I can look in to find this job? It does not seem to be in the act_ru_execution and/or act_ru_job table when I check.

I am working on a unit test that exposes the bug. The big issue so far is that it takes quite a while for the issue to present itself (the retries are within a few minutes, but the null pointers are after almost an hour).

I will keep you posted. Thanks.

rallen1
Confirmed Champ
Confirmed Champ
Hey Tijs,

I think the problem is actually related to the async="true". I have created a relatively minimal unit test where the same issue I have been having pops up.

The unit test can be found here:

https://github.com/rwallen/activiti-unit-test-null-error

I apologize that the unit test takes a while before the problem arises. I could not understand enough of the depths of Activiti's source code to figure out a way to expose it quicker. Resolving this issue is critical for whether Activiti will be able to meet my performance requirements.

Any help is greatly appreciated and if there is any way I can help, please let me know.

Thanks.

rallen1
Confirmed Champ
Confirmed Champ
Hello,

I was able to get around this issue by using the JobExectutor with a single thread (Async with a single thread did not seem to work). Now that 5.18.0 has come out, I thought I would test the issue again with Async Executor, and I have found that it still pops up.

I have delved in a bit more and I was hoping I could get some internal Activiti expertise on what I am seeing. In the log, thread 2 seems to execute a job, then deletes the execution. Six minutes later, thread 6 starts the job (the execution 05b35775-3d52-11e5-bdfd-3417ebc8228a is no in the database). It then attempts to grab the execution at this line:

https://github.com/Activiti/Activiti/blob/master/modules/activiti-engine/src/main/java/org/activiti/...

but it comes back null, which it keeps passing down until there is a NullPointer when something tries to use the execution (CommandContext.performOperation).

Here is the full log:

<code>
03:54:54,605 [pool-1-thread-2] DEBUG org.activiti.engine.impl.cmd.LockExclusiveJobCmd  - Executing lock exclusive job 05b5c880-3d52-11e5-bdfd-3417ebc8228a 05b35775-3d52-11e5-bdfd-3417ebc8228a
03:54:54,605 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.ExecutionEntity.selectExecution  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
03:54:54,620 [pool-1-thread-2] DEBUG org.activiti.engine.impl.cmd.ExecuteAsyncJobCmd  - Executing async job 05b5c880-3d52-11e5-bdfd-3417ebc8228a
03:54:54,620 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.ExecutionEntity.selectExecution  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
03:54:54,620 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.ExecutionEntity.selectExecutionsByParentExecutionId  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
03:54:54,620 [pool-1-thread-2] DEBUG org.activiti.engine.impl.pvm.runtime.AtomicOperationTransitionCreateScope  - create scope: parent ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a] continues as execution ScopeExecution[51a9ce66-3d57-11e5-bdfd-3417ebc8228a]
03:54:54,723 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.VariableInstanceEntity.selectVariablesByExecutionId  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
03:54:56,702 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.HistoricActivityInstanceEntity.selectHistoricActivityInstancesByQueryCriteria  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String), 1(Integer), 0(Integer)
03:54:56,702 [pool-1-thread-2] DEBUG org.activiti.engine.impl.pvm.runtime.AtomicOperationTransitionDestroyScope  - destroy scope: scoped ScopeExecution[51a9ce66-3d57-11e5-bdfd-3417ebc8228a] continues as parent scope ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a]
03:54:56,718 [pool-1-thread-2] DEBUG org.activiti.engine.impl.pvm.runtime.AtomicOperationTransitionNotifyListenerTake  - ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a] takes transition (servicetask2)–flow13–>(parallelgateway1)
03:54:56,718 [pool-1-thread-2] DEBUG org.activiti.engine.impl.pvm.runtime.AtomicOperationActivityExecute  - ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a] executes Activity(parallelgateway1): org.activiti.engine.impl.bpmn.behavior.ParallelGatewayActivityBehavior
03:54:57,248 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.ExecutionEntity  - inactive concurrent executions in 'Activity(parallelgateway1)': [ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a], ConcurrentExecution[05b0e671-3d52-11e5-bdfd-3417ebc8228a], ConcurrentExecution[05b35773-3d52-11e5-bdfd-3417ebc8228a]]
03:54:57,264 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.ExecutionEntity  - pruning execution ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a]
03:54:57,264 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.TaskEntity.selectTasksByExecutionId  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
03:54:57,264 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.JobEntity.selectJobsByExecutionId  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
03:54:57,295 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.EventSubscriptionEntity.selectEventSubscriptionsByExecution  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
03:54:57,295 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.IdentityLinkEntity.selectIdentityLinksByProcessInstance  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
03:54:57,374 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  - skipping redundant delete: JobEntity [id=05b5c880-3d52-11e5-bdfd-3417ebc8228a]
03:54:58,208 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  -   delete JobEntity [id=05b5c880-3d52-11e5-bdfd-3417ebc8228a]
03:54:58,208 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  -   delete ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a]
03:54:58,544 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.HistoricActivityInstanceEntity.bulkInsertHistoricActivityInstance  - ==> Parameters: 52e9efa8-3d57-11e5-bdfd-3417ebc8228a(String), my-process:1:0403f9af-3d52-11e5-bdfd-3417ebc8228a(String), 046a8379-3d52-11e5-bdfd-3417ebc8228a(String), 05b35775-3d52-11e5-bdfd-3417ebc8228a(String), parallelgateway1(String), null, null, Parallel Gateway(String), parallelGateway(String), null, 2015-08-07 15:54:56.718(Timestamp), 2015-08-07 15:54:57.264(Timestamp), 546(Long), (String), 534e08b3-3d57-11e5-bdfd-3417ebc8228a(String), my-process:1:0403f9af-3d52-11e5-bdfd-3417ebc8228a(String), 046a8379-3d52-11e5-bdfd-3417ebc8228a(String), 534e08b1-3d57-11e5-bdfd-3417ebc8228a(String), timerintermediatecatchevent1(String), null, null, TimerCatchEvent(String), intermediateTimer(String), null, 2015-08-07 15:54:57.374(Timestamp), null, null, (String)
03:54:59,638 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  - executing: delete JobEntity [id=05b5c880-3d52-11e5-bdfd-3417ebc8228a]
03:54:59,638 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.JobEntity.deleteMessage  - ==> Parameters: 05b5c880-3d52-11e5-bdfd-3417ebc8228a(String), 4(Integer)
03:54:59,685 [pool-1-thread-2] DEBUG org.activiti.engine.impl.db.DbSqlSession  - executing: delete ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a]
03:54:59,685 [pool-1-thread-2] DEBUG org.activiti.engine.impl.persistence.entity.ExecutionEntity.deleteExecution  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String), 1(Integer)
03:54:59,857 [pool-1-thread-2] DEBUG org.activiti.engine.impl.cmd.UnlockExclusiveJobCmd  - Unlocking exclusive job 05b5c880-3d52-11e5-bdfd-3417ebc8228a
</code>
(6 minutes later…)
<code>
04:00:03,037 [pool-1-thread-6] DEBUG org.activiti.engine.impl.cmd.LockExclusiveJobCmd  - Executing lock exclusive job 05b5c880-3d52-11e5-bdfd-3417ebc8228a 05b35775-3d52-11e5-bdfd-3417ebc8228a
04:00:03,053 [pool-1-thread-6] DEBUG org.activiti.engine.impl.persistence.entity.ExecutionEntity.selectExecution  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
04:00:03,100 [pool-1-thread-6] DEBUG org.activiti.engine.impl.cmd.ExecuteAsyncJobCmd  - Executing async job 05b5c880-3d52-11e5-bdfd-3417ebc8228a
04:00:03,100 [pool-1-thread-6] DEBUG org.activiti.engine.impl.persistence.entity.ExecutionEntity.selectExecution  - ==> Parameters: 05b35775-3d52-11e5-bdfd-3417ebc8228a(String)
</code>

I guess I have some questions:

  1. Are the JobEntity [id=05b5c880-3d52-11e5-bdfd-3417ebc8228a] and/or ConcurrentExecution[05b35775-3d52-11e5-bdfd-3417ebc8228a] actually being deleted?
  2.  
  3. Can an Async Executor thread theoretically attempt to work on an already deleted job and/or execution? Especially in the case where activiti:async=true? The Activiti User Guide mentioned that an activiti:async=true task will give "the local job executor a little hint that there is a new job". Could it be related to this – an old job cached maybe?
As always, I appreciate any and all help. Thank you.