cancel
Showing results for 
Search instead for 
Did you mean: 

Deadlock causing transaction corruption and breaking async executor

grahamjohnson80
Champ on-the-rise
Champ on-the-rise
We have just started using Activiti (replacing JBPM 4.4) and we saw a deadlock occur in our test environment that appears to have been caused by on thread trying to lock an ACT_RU_EXECUTION row while another was updating it (unclear if it was locked already).  More troubling, though, this seemed to have left a transaction in a bad state, causing subsequent runs of the async executor to fail with "Transaction is not active" exceptions, and this did not go away until we restarted JBoss.  The same thing happened two more times later.  See below for all the information I can provide; the one thing that is missing is an stack trace or other indication of what code was performing the ACT_RU_EXECUTION update in the second transaction involved in the deadlock.

What might cause this deadlock?  Is it an Activiti bug?  A configuration problem (Activiti, data source, transactions, iBatis, JBoss, MySQL)?  An application bug (i.e. we are misusing Activiti somehow)?  Is there any way to further diagnore or prevent the deadlock?  Is there any way to further diagnore or prevent the transaction being messed up by the deadlock?


Here is the first exception about the deadlock:


2015-07-09 23:41:09,241 ERROR [pl.interceptor.JtaTransactionInterceptor] (    pool-14-thread-1) {     } Error when rolling back transaction, orginal exception was:
org.apache.ibatis.exceptions.PersistenceException:
### Error updating database.  Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may involve org.activiti.engine.impl.persistence.entity.ExecutionEntity.clearProcessInstanceLockTime-Inline
### The error occurred while setting parameters
### SQL: update ACT_RU_EXECUTION      set       LOCK_TIME_ = null     where ID_ = ?
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
        at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:23)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:150)
        at org.activiti.engine.impl.db.DbSqlSession.update(DbSqlSession.java:180)
        at org.activiti.engine.impl.persistence.entity.ExecutionEntityManager.clearProcessInstanceLockTime(ExecutionEntityManager.java:212)
        at org.activiti.engine.impl.cmd.UnlockExclusiveJobCmd.execute(UnlockExclusiveJobCmd.java:55)
        at org.activiti.engine.impl.interceptor.CommandInvoker.execute(CommandInvoker.java:24)
        at org.activiti.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:57)
        at org.activiti.engine.impl.interceptor.JtaTransactionInterceptor.execute(JtaTransactionInterceptor.java:65)
        at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
        at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40)
        at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35)
        at org.activiti.engine.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:97)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Here is the SHOW ENGINE INNODB STATUS output:


————————
LATEST DETECTED DEADLOCK
————————
150709 20:28:08
*** (1) TRANSACTION:
TRANSACTION CCB4FB3, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 15308, OS thread handle 0x7fb7c709e700, query id 17739169 127.0.0.1 root Updating
update ACT_RU_EXECUTION
    set
      LOCK_TIME_ = null
    where ID_ = '73837'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 116736 n bits 152 index `PRIMARY` of table `SCActiviti`.`ACT_RU_EXECUTION` trx id CCB4FB3 lock_mode X locks rec but not gap waiting
Record lock, heap no 83 PHYSICAL RECORD: n_fields 19; compact format; info bits 0
[record info omitted]

*** (2) TRANSACTION:
TRANSACTION CCB4FB1, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
12 lock struct(s), heap size 1248, 6 row lock(s), undo log entries 14
MySQL thread id 15317, OS thread handle 0x7fb7c7ac6700, query id 17739179 127.0.0.1 root Updating
update ACT_RU_EXECUTION set
      REV_ = 18,
      BUSINESS_KEY_ = '[value omitted]',
      PROC_DEF_ID_ = '[value omitted]:2:67630',
      ACT_ID_ = null,
      IS_ACTIVE_ = 0,
      IS_CONCURRENT_ = 0,
      IS_SCOPE_ = 1,
      IS_EVENT_SCOPE_ = 0,
      PARENT_ID_ = null,
      SUPER_EXEC_ = null,
      SUSPENSION_STATE_ = 1,
      CACHED_ENT_STATE_ = 4,
      NAME_ = null,
      LOCK_TIME_ = null
    where ID_ = '73837'
      and REV_ = 17
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 116736 n bits 152 index `PRIMARY` of table `SCActiviti`.`ACT_RU_EXECUTION` trx id CCB4FB1 lock mode S locks rec but not gap
Record lock, heap no 83 PHYSICAL RECORD: n_fields 19; compact format; info bits 0
[record info omitted]

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 116736 n bits 152 index `PRIMARY` of table `SCActiviti`.`ACT_RU_EXECUTION` trx id CCB4FB1 lock_mode X locks rec but not gap waiting
[record info omitted]

*** WE ROLL BACK TRANSACTION (1)


Then we got this exception, which is presumably related to the transaction being put in a bad state:


2015-07-09 23:41:10,318 ERROR [.impl.asyncexecutor.ExecuteAsyncRunnable] (    pool-14-thread-1) {     } Error while unlocking exclusive job 75713
java.lang.IllegalMonitorStateException
        at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
        at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
        at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.unlock(BaseWrapperManagedConnection.java:277)
        at org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.end(XAManagedConnection.java:268)
        at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelAbort(XAResourceRecord.java:396)
        at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:3270)
        at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:3248)
        at com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1933)
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:114)
        at com.arjuna.ats.arjuna.AtomicAction.abort(AtomicAction.java:201)
        at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.rollbackAndDisassociate(TransactionImple.java:1497)
        at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.rollback(BaseTransaction.java:165)
        at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.rollback(BaseTransactionManagerDelegate.java:114)
        at org.activiti.engine.impl.interceptor.JtaTransactionInterceptor.doRollback(JtaTransactionInterceptor.java:147)
        at org.activiti.engine.impl.interceptor.JtaTransactionInterceptor.execute(JtaTransactionInterceptor.java:67)
        at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
        at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40)
        at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35)
        at org.activiti.engine.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:97)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Subsequent async executor runs then (all?) got the following exception:


2015-07-09 23:41:15,949 ERROR [.impl.asyncexecutor.ExecuteAsyncRunnable] (    pool-14-thread-1) {     } Error while locking exclusive job 75738
org.apache.ibatis.exceptions.PersistenceException:
### Error querying database.  Cause: org.jboss.util.NestedSQLException: Transaction is not active: tx=TransactionImple < ac, BasicAction: 323983b3:8222:559eee8e:9444 status: ActionStatus.ABORTING >; - nested throwable: (javax.resource.Reso
urceException: Transaction is not active: tx=TransactionImple < ac, BasicAction: 323983b3:8222:559eee8e:9444 status: ActionStatus.ABORTING >)
### The error may exist in org/activiti/db/mapping/entity/Execution.xml
### The error may involve org.activiti.engine.impl.persistence.entity.ExecutionEntity.selectExecution
### The error occurred while executing a query
### Cause: org.jboss.util.NestedSQLException: Transaction is not active: tx=TransactionImple < ac, BasicAction: 323983b3:8222:559eee8e:9444 status: ActionStatus.ABORTING >; - nested throwable: (javax.resource.ResourceException: Transaction is not active: tx=TransactionImple < ac, BasicAction: 323983b3:8222:559eee8e:9444 status: ActionStatus.ABORTING >)
        at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:23)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:107)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:98)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:62)
        at org.activiti.engine.impl.db.DbSqlSession.selectById(DbSqlSession.java:457)
        at org.activiti.engine.impl.persistence.entity.ExecutionEntityManager.findExecutionById(ExecutionEntityManager.java:108)
        at org.activiti.engine.impl.cmd.LockExclusiveJobCmd.execute(LockExclusiveJobCmd.java:53)
        at org.activiti.engine.impl.interceptor.CommandInvoker.execute(CommandInvoker.java:24)
        at org.activiti.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:57)
        at org.activiti.engine.impl.interceptor.JtaTransactionInterceptor.execute(JtaTransactionInterceptor.java:65)
        at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
        at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40)
        at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35)
        at org.activiti.engine.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:52)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

       
Here is how the data source is configured in JBoss 5.1:


<?xml version="1.0" encoding="UTF-8"?>
<datasources>
   <xa-datasource>
      <jndi-name>SCActivitiDS</jndi-name>
      <xa-datasource-class>com.mysql.jdbc.jdbc2.optional.MysqlXADataSource</xa-datasource-class>
      <xa-datasource-property name="URL">[value omitted]</xa-datasource-property>
      <user-name>[value omitted]</user-name>
      <password>[value omitted]</password>
      <max-pool-size>100</max-pool-size>
      <track-connection-by-tx>true</track-connection-by-tx>
      <metadata>
         <type-mapping>mySQL</type-mapping>
      </metadata>
   </xa-datasource>
</datasources>


Here is our activiti.cfg.xml:


<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">

   <bean id="processEngineConfiguration" class="org.activiti.engine.impl.cfg.JtaProcessEngineConfiguration">
      <property name="dataSource" ref="dataSource" />
      <property name="databaseSchemaUpdate" value="false" />
      <property name="transactionManager" ref="transactionManager" />
      <property name="asyncExecutorEnabled" value="true" />
      <property name="asyncExecutorActivate" value="true" />
   </bean>

   <bean id="dataSource" class="org.springframework.jndi.JndiObjectFactoryBean">
      <property name="jndiName" value="java:SCActivitiDS" />
   </bean>

   <bean id="transactionManager" class="org.springframework.jndi.JndiObjectFactoryBean">
      <property name="jndiName" value="java:/TransactionManager" />
   </bean>

</beans>
12 REPLIES 12

jbarrez
Star Contributor
Star Contributor
Thanks for the detailed description of your setup. Unfortunately I don't see anything wrong there.

So the logic you see is not a bug. The update of the lock time is for the execution of jobs (async activity / timer).
The other update that is happening is a regular update. One of the two should lose here … not going into a deadlock. No idea why it would go into a deadlock … would be interesting to find out why the database decides to do this in this case.

Do you have a process definition where this failed?

grahamjohnson80
Champ on-the-rise
Champ on-the-rise
Hi Joram, thanks for your response.  Although I haven't yet been able to reproduce this deadlock with additional logging to fully determine what is going on, I did two possible causes (possibly interrelated with each other):

1. Exclusive jobs notify (via AsyncJobAddedNotification) about new jobs they have created as soon as the job execution is committed.  The new job begins executing (or, if it is exclusive, locking via LockExclusiveJobCmd) immediately upon notification, but then an UnlockExclusiveJobCmd is executed for the first job that tries to clear the LOCK_TIME_ of the ACT_RU_EXECUTION (one of the two SQL statements involved in my deadlocks).  So two threads are trying to access/update the same ACT_RU_EXECUTION row concurrently (the UnlockExecutionJobCmd from the first job, and the ExecuteAsyncJobCmd from the second job).  The second SQL statement involved in the deadlock comes out of the second job execution.  I'm not sure if it matter, but that second job happens to be non-exclusive – see below – so it does not try to lock the execution before executing.

2. Some of my jobs (including the one for the second job involved in the deadlock) are not exclusive.  In the logs, I seem them inserted with EXCLUSIVE_ = false/0, and they do not try to lock or unlock the execution around being executed.  However, my understanding is that all my jobs should be exclusive.  The user guide says: "Since Activiti 5.9 [we are on 5.17], the JobExecutor makes sure that jobs from a single process instance are never executed concurrently." and "Since Activiti 5.9, exclusive jobs are the default configuration. All asynchronous continuations and timer events are thus exclusive by default. In addition, if you want a job to be non-exclusive, you can configure it as such using activiti:exclusive="false"." I do not have any activity:exclusive="false" in any of my process definitions.

Questions:

- Shouldn't the async executor wait until the UnlockExclusiveJobCmd has been committed before sending the AsyncJobAddedNotification that allows another thread to start the new job and access/update the same execution row that the current thread still needs to update (to clear the lock time)?  Or is the current behavior by design?

- Why are some jobs non-exclusive when I expect all jobs to be exclusive?  How can I correct this?

- If the second job were exclusive, would it then run serially on the same thread as the job that created it instead of on the other thread?  Or would it still be picked up by the other thread, but not deadlock due to it locking the execution before executing?

grahamjohnson80
Champ on-the-rise
Champ on-the-rise
Here is annotated log output (irrelevant lines/data removed) showing the deadlock occurring as described in my previous post.  Sorry but I can't figure out how to format this as fixed-width without syntax highlighting:



# Job 110543 starts running:

2015-07-20 16:25:40,197 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________)
2015-07-20 16:25:40,197 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________) — starting LockExclusiveJobCmd ——————————————————–
2015-07-20 16:25:40,197 DEBUG [pl.interceptor.JtaTransactionInterceptor] (pool-14-thread-2________________) Running command with propagation REQUIRED
2015-07-20 16:25:40,197 DEBUG [viti.engine.impl.cmd.LockExclusiveJobCmd] (pool-14-thread-2________________) Executing lock exclusive job 110543 110490
2015-07-20 16:25:40,199 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________) flush summary: 0 insert, 0 update, 0 delete.
2015-07-20 16:25:40,199 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________) now executing flush…
2015-07-20 16:25:40,199 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________) — LockExclusiveJobCmd finished ——————————————————–
2015-07-20 16:25:40,199 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________)
2015-07-20 16:25:40,199 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________)
2015-07-20 16:25:40,199 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________) — starting ExecuteAsyncJobCmd ——————————————————–
2015-07-20 16:25:40,199 DEBUG [pl.interceptor.JtaTransactionInterceptor] (pool-14-thread-2________________) Running command with propagation REQUIRED
2015-07-20 16:25:40,199 DEBUG [iviti.engine.impl.cmd.ExecuteAsyncJobCmd] (pool-14-thread-2________________) Executing async job 110543
2015-07-20 16:25:40,200 DEBUG [m.runtime.AtomicOperationActivityExecute] (pool-14-thread-2________________) ProcessInstance[110490] executes Activity[…]
2015-07-20 16:25:40,213 DEBUG [micOperationTransitionNotifyListenerTake] (pool-14-thread-2________________) ProcessInstance[110490] takes transition […]
2015-07-20 16:25:40,214 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________) flush summary: 2 insert, 1 update, 1 delete.
2015-07-20 16:25:40,214 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________)   insert HistoricActivityInstanceEntity[…]
2015-07-20 16:25:40,215 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________)   insert JobEntity [id=110545]
2015-07-20 16:25:40,215 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________)   update ProcessInstance[110490]
2015-07-20 16:25:40,215 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________)   delete JobEntity [id=110543]
2015-07-20 16:25:40,215 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________) now executing flush…
2015-07-20 16:25:40,215 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________) inserting: HistoricActivityInstanceEntity[…]

# Job 110543 creates new job 110545.  Unlike 110543, 110545 is created as non-exclusive (EXCLUSIVE_ = false):

2015-07-20 16:25:40,216 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2________________) inserting: JobEntity [id=110545]
2015-07-20 16:25:40,216 DEBUG [rsistence.entity.JobEntity.insertMessage] (pool-14-thread-2________________) ==>  Preparing: insert into ACT_RU_JOB ( ID_, REV_, TYPE_, LOCK_OWNER_, LOCK_EXP_TIME_, EXCLUSIVE_, EXECUTION_ID_, PROCESS_INSTANCE_ID_, PROC_DEF_ID_, DUEDATE_, RETRIES_, EXCEPTION_STACK_ID_, EXCEPTION_MSG_, HANDLER_TYPE_, HANDLER_CFG_, TENANT_ID_) values (?, 1, 'message', ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )
2015-07-20 16:25:40,216 DEBUG [rsistence.entity.JobEntity.insertMessage] (pool-14-thread-2________________) ==> Parameters: 110545(String), null, null, false(Boolean), 110490(String), 110490(String), […](String), 2015-07-20 16:30:40.214(Timestamp), 3(Integer), null, null, async-continuation(String), null, […](String)

# Job 110543, on commit, notifies about new job 110545:

2015-07-20 16:25:40,221 DEBUG [pl.jobexecutor.AsyncJobAddedNotification] (pool-14-thread-2________________) notifying job executor of new job

# Job 110545 immediately starts running on the other thread.  Since it is non-exclusive, it doesn't try to lock the execution before running:

2015-07-20 16:25:40,221 DEBUG [i.engine.impl.interceptor.LogInterceptor] (________________pool-14-thread-1)
2015-07-20 16:25:40,221 DEBUG [i.engine.impl.interceptor.LogInterceptor] (________________pool-14-thread-1) — starting ExecuteAsyncJobCmd ——————————————————–
2015-07-20 16:25:40,221 DEBUG [pl.interceptor.JtaTransactionInterceptor] (________________pool-14-thread-1) Running command with propagation REQUIRED
2015-07-20 16:25:40,221 DEBUG [iviti.engine.impl.cmd.ExecuteAsyncJobCmd] (________________pool-14-thread-1) Executing async job 110545
2015-07-20 16:25:40,223 DEBUG [ime.AtomicOperationTransitionCreateScope] (________________pool-14-thread-1) create scope: parent ProcessInstance[110490] continues as execution ScopeExecution[110546]
2015-07-20 16:25:40,223 DEBUG [m.runtime.AtomicOperationActivityExecute] (________________pool-14-thread-1) ScopeExecution[110546] executes Activity[…]
2015-07-20 16:25:40,227 DEBUG [m.runtime.AtomicOperationActivityExecute] (________________pool-14-thread-1) ProcessInstance[110548] executes Activity[…]
2015-07-20 16:25:40,228 DEBUG [.impl.bpmn.behavior.BpmnActivityBehavior] (________________pool-14-thread-1) Leaving activity '[…]'
2015-07-20 16:25:40,228 DEBUG [micOperationTransitionNotifyListenerTake] (________________pool-14-thread-1) ProcessInstance[110548] takes transition […]
2015-07-20 16:25:40,229 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) flush summary: 18 insert, 1 update, 1 delete.
2015-07-20 16:25:40,229 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert ScopeExecution[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricActivityInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert ProcessInstance[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricProcessInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricActivityInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert VariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert VariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert VariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert VariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert VariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert VariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   insert JobEntity [id=110556]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   update ProcessInstance[110490]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1)   delete JobEntity [id=110545]
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) now executing flush…
2015-07-20 16:25:40,230 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: ScopeExecution[110546]
2015-07-20 16:25:40,231 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricActivityInstanceEntity[…]
2015-07-20 16:25:40,231 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: ProcessInstance[110548]
2015-07-20 16:25:40,232 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________) — ExecuteAsyncJobCmd finished ——————————————————–
2015-07-20 16:25:40,232 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________)

# Job 110543 tries to unlock itself, in the middle of 110545's execution (note the interleaved output between the two threads):

2015-07-20 16:25:40,232 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________)
2015-07-20 16:25:40,232 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________) — starting UnlockExclusiveJobCmd ——————————————————–
2015-07-20 16:25:40,232 DEBUG [pl.interceptor.JtaTransactionInterceptor] (pool-14-thread-2________________) Running command with propagation REQUIRED
2015-07-20 16:25:40,232 DEBUG [ti.engine.impl.cmd.UnlockExclusiveJobCmd] (pool-14-thread-2________________) Unlocking exclusive job 110543
2015-07-20 16:25:40,232 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricProcessInstanceEntity[superProcessInstanceId=110490]
2015-07-20 16:25:40,233 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2________________) ==>  Preparing: select * from ACT_RU_EXECUTION where ID_ = ?
2015-07-20 16:25:40,233 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2________________) ==> Parameters: 110490(String)

# The update that ends up deadlocking and gets rolled back by MySQL:

2015-07-20 16:25:40,234 DEBUG [utionEntity.clearProcessInstanceLockTime] (pool-14-thread-2________________) ==>  Preparing: update ACT_RU_EXECUTION set LOCK_TIME_ = null where ID_ = ?
2015-07-20 16:25:40,234 DEBUG [utionEntity.clearProcessInstanceLockTime] (pool-14-thread-2________________) ==> Parameters: 110490(String)
2015-07-20 16:25:40,234 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricActivityInstanceEntity[…]
2015-07-20 16:25:40,235 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: VariableInstanceEntity[…]
2015-07-20 16:25:40,235 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,236 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: VariableInstanceEntity[…]
2015-07-20 16:25:40,237 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,237 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: VariableInstanceEntity[…]
2015-07-20 16:25:40,238 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,238 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: VariableInstanceEntity[…]
2015-07-20 16:25:40,239 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,240 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: VariableInstanceEntity[…]
2015-07-20 16:25:40,240 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,241 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: VariableInstanceEntity[…]
2015-07-20 16:25:40,241 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: HistoricVariableInstanceEntity[…]
2015-07-20 16:25:40,242 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) inserting: JobEntity […]

The other update involved in the deadlock:

2015-07-20 16:25:40,242 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) updating: ProcessInstance[110490]
2015-07-20 16:25:40,243 DEBUG [e.entity.ExecutionEntity.updateExecution] (________________pool-14-thread-1) ==>  Preparing: update ACT_RU_EXECUTION set REV_ = ?, BUSINESS_KEY_ = ?, PROC_DEF_ID_ = ?, ACT_ID_ = ?, IS_ACTIVE_ = ?, IS_CONCURRENT_ = ?, IS_SCOPE_ = ?, IS_EVENT_SCOPE_ = ?, PARENT_ID_ = ?, SUPER_EXEC_ = ?, SUSPENSION_STATE_ = ?, CACHED_ENT_STATE_ = ?, NAME_ = ?, LOCK_TIME_ = ? where ID_ = ? and REV_ = ?

The deadlock occurring:

2015-07-20 16:25:40,247 DEBUG [i.engine.impl.interceptor.CommandContext] (pool-14-thread-2________________) Error while closing command context
org.apache.ibatis.exceptions.PersistenceException:
### Error updating database.  Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may involve org.activiti.engine.impl.persistence.entity.ExecutionEntity.clearProcessInstanceLockTime-Inline
### The error occurred while setting parameters
### SQL: update ACT_RU_EXECUTION      set       LOCK_TIME_ = null     where ID_ = ?
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:23)
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:150)
at org.activiti.engine.impl.db.DbSqlSession.update(DbSqlSession.java:180)
at org.activiti.engine.impl.persistence.entity.ExecutionEntityManager.clearProcessInstanceLockTime(ExecutionEntityManager.java:212)
at org.activiti.engine.impl.cmd.UnlockExclusiveJobCmd.execute(UnlockExclusiveJobCmd.java:55)
at org.activiti.engine.impl.interceptor.CommandInvoker.execute(CommandInvoker.java:24)
at org.activiti.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:57)
at org.activiti.engine.impl.interceptor.JtaTransactionInterceptor.execute(JtaTransactionInterceptor.java:65)
at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:37)
at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40)
at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35)
at org.activiti.engine.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:97)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1066)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2617)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2156)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
at com.mysql.jdbc.jdbc2.optional.PreparedStatementWrapper.execute(PreparedStatementWrapper.java:822)
at org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.execute(WrappedPreparedStatement.java:299)
at sun.reflect.GeneratedMethodAccessor931.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:55)
at com.sun.proxy.$Proxy778.execute(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.update(PreparedStatementHandler.java:41)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.update(RoutingStatementHandler.java:66)
at org.apache.ibatis.executor.SimpleExecutor.doUpdate(SimpleExecutor.java:45)
at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:100)
at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:75)
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:148)
… 13 more
2015-07-20 16:25:40,250 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________) — UnlockExclusiveJobCmd finished ——————————————————–
2015-07-20 16:25:40,250 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________)
2015-07-20 16:25:40,250 ERROR [.impl.asyncexecutor.ExecuteAsyncRunnable] (pool-14-thread-2________________) Error while unlocking exclusive job 110543
org.apache.ibatis.exceptions.PersistenceException:
[identical stack trace logged again]
2015-07-20 16:25:40,494 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (________________pool-14-thread-1) executing: delete JobEntity [id=110545]
2015-07-20 16:25:40,496 DEBUG [pl.jobexecutor.AsyncJobAddedNotification] (________________pool-14-thread-1) notifying job executor of new job
2015-07-20 16:25:40,496 DEBUG [i.engine.impl.interceptor.LogInterceptor] (________________pool-14-thread-1) — ExecuteAsyncJobCmd finished ——————————————————–
2015-07-20 16:25:40,496 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2________________)

jbarrez
Star Contributor
Star Contributor
> Shouldn't the async executor wait until the UnlockExclusiveJobCmd has been committed before sending the AsyncJobAddedNotification > that allows another thread to start the new job and access/update the same execution row that the current thread still needs to update
>(to clear the lock time)? Or is the current behavior by design?

The UnlockExclusiveJobCmd is executed in its own transaction (passed to the commandExecutor), and the AsyncJobAddedNotification is similarly hooked into the COMITTED listener of the insert transaction. So unless I'm missing something, that seems fine. Or how do you see it?

See https://github.com/Activiti/Activiti/blob/master/modules/activiti-engine/src/main/java/org/activiti/...

> Why are some jobs non-exclusive when I expect all jobs to be exclusive? How can I correct this?

All jobs are by default exclusive. I can't explain why they would not be for you. Which kind of jobs are exclusive?

> If the second job were exclusive, would it then run serially on the same thread as the job that created it instead of on the other thread?
> Or would it still be picked up by the other thread, but not deadlock due to it locking the execution before executing?

Jobs are always executed by different threads, no reuse there. The exclusive flag comes into play before the job is exectuted: it tries to lock the process instance, and only one can succeed in this.


Furthermore: what happens with the failing job when the deadlock occurs? It should be retried later on automatically. What happens in that case.

We did add some more logic around the locking in the current 5.18 branch, so fingers crossed there (but dont get your hopes up yet), cause from the logs I can't quite deduce why it would deadlock in that case.

grahamjohnson80
Champ on-the-rise
Champ on-the-rise
By "Shouldn't the async executor wait until the UnlockExclusiveJobCmd has been committed before sending the AsyncJobAddedNotification",  I was not suggesting that these were currently done in the same transaction (I know they aren't), but rather that I think the behavior should be changed so that the first job completely finishes, <i>including unlocking</i> (if required), <b>before</b> the notification is sent about the new job, so that the new job will not run concurrently with the unlocking of the first job and cause a deadlock.

Jobs are always executed by different threads, no reuse there. The exclusive flag comes into play before the job is exectuted: it tries to lock the process instance, and only one can succeed in this.

The user guide says "Instead, it will make sure that whenever it acquires an exclusive job from a certain process instance, it acquires all other exclusive jobs from the same process instance and delegates them to the same worker thread." which is why I thought if both jobs are exclusive they will run on the same thread (and thus sequentially instead of concurrently).

Furthermore: what happens with the failing job when the deadlock occurs? It should be retried later on automatically. What happens in that case.

I believe that since it is just the final unlocking step of the job, and not the job execution itself,  that deadlocks and is rolled back, the job itself is not marked as having failed and nothing (including the unlocking) is retried.  However, based on your question, I took another look at our logs and found that the LOCK_TIME_ is actually set to NULL while the first job is executing, before the UnlockExclusiveJobCmd is run.  Here is more complete log output of just the one thread (from the same run as above) showing this:


2015-07-20 16:25:40,197 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2)
2015-07-20 16:25:40,197 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2) — starting LockExclusiveJobCmd ——————————————————–
2015-07-20 16:25:40,197 DEBUG [pl.interceptor.JtaTransactionInterceptor] (pool-14-thread-2) Running command with propagation REQUIRED
2015-07-20 16:25:40,197 DEBUG [viti.engine.impl.cmd.LockExclusiveJobCmd] (pool-14-thread-2) Executing lock exclusive job 110543 110490
2015-07-20 16:25:40,197 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ooo Using Connection [org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6@59ca5b26]
2015-07-20 16:25:40,197 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ==>  Preparing: select * from ACT_RU_EXECUTION where ID_ = ?
2015-07-20 16:25:40,197 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ==> Parameters: 110490(String)
2015-07-20 16:25:40,198 TRACE [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) <==    Columns: ID_, REV_, PROC_INST_ID_, BUSINESS_KEY_, PARENT_ID_, PROC_DEF_ID_, SUPER_EXEC_, ACT_ID_, IS_ACTIVE_, IS_CONCURRENT_, IS_SCOPE_, IS_EVENT_SCOPE_, SUSPENSION_STATE_, CACHED_ENT_STATE_, TENANT_ID_, NAME_, LOCK_TIME_
2015-07-20 16:25:40,198 TRACE [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) <==        Row: 110490, 5, 110490, [omitted], null, [omitted]:4:97630, null, [omitted], 1, 0, 1, 0, 1, 4, [omitted], null, null
2015-07-20 16:25:40,198 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) <==      Total: 1
2015-07-20 16:25:40,198 DEBUG [tionEntity.updateProcessInstanceLockTime] (pool-14-thread-2) ooo Using Connection [org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6@59ca5b26]

# Execution row gets locked:

2015-07-20 16:25:40,198 DEBUG [tionEntity.updateProcessInstanceLockTime] (pool-14-thread-2) ==>  Preparing: update ACT_RU_EXECUTION set LOCK_TIME_ = ? where ID_ = ? and (LOCK_TIME_ is null OR LOCK_TIME_ <= ?)
2015-07-20 16:25:40,198 DEBUG [tionEntity.updateProcessInstanceLockTime] (pool-14-thread-2) ==> Parameters: 2015-07-20 16:30:40.198(Timestamp), 110490(String), 2015-07-20 16:25:40.198(Timestamp)
2015-07-20 16:25:40,199 DEBUG [tionEntity.updateProcessInstanceLockTime] (pool-14-thread-2) <==    Updates: 1
2015-07-20 16:25:40,199 TRACE [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2) loaded object 'ProcessInstance[110490]' was not updated
2015-07-20 16:25:40,199 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2) flush summary: 0 insert, 0 update, 0 delete.
2015-07-20 16:25:40,199 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2) now executing flush…
2015-07-20 16:25:40,199 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2) — LockExclusiveJobCmd finished ——————————————————–
2015-07-20 16:25:40,199 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2)
2015-07-20 16:25:40,199 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2)
2015-07-20 16:25:40,199 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2) — starting ExecuteAsyncJobCmd ——————————————————–
2015-07-20 16:25:40,199 DEBUG [pl.interceptor.JtaTransactionInterceptor] (pool-14-thread-2) Running command with propagation REQUIRED
2015-07-20 16:25:40,199 DEBUG [iviti.engine.impl.cmd.ExecuteAsyncJobCmd] (pool-14-thread-2) Executing async job 110543
2015-07-20 16:25:40,200 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ooo Using Connection [org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6@66c7f0a1]
2015-07-20 16:25:40,200 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ==>  Preparing: select * from ACT_RU_EXECUTION where ID_ = ?
2015-07-20 16:25:40,200 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ==> Parameters: 110490(String)

# The execution row is indeed locked:

2015-07-20 16:25:40,200 TRACE [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) <==    Columns: ID_, REV_, PROC_INST_ID_, BUSINESS_KEY_, PARENT_ID_, PROC_DEF_ID_, SUPER_EXEC_, ACT_ID_, IS_ACTIVE_, IS_CONCURRENT_, IS_SCOPE_, IS_EVENT_SCOPE_, SUSPENSION_STATE_, CACHED_ENT_STATE_, TENANT_ID_, NAME_, LOCK_TIME_
2015-07-20 16:25:40,200 TRACE [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) <==        Row: 110490, 5, 110490, [omitted], null, [omitted]:4:97630, null, [omitted], 1, 0, 1, 0, 1, 4, [omitted], null, 2015-07-20 16:30:40.0

# A bunch of job execution inserts/updates omitted here…

2015-07-20 16:25:40,216 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2) inserting: JobEntity [id=110545]
2015-07-20 16:25:40,216 DEBUG [rsistence.entity.JobEntity.insertMessage] (pool-14-thread-2) ooo Using Connection [org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6@66c7f0a1]
2015-07-20 16:25:40,216 DEBUG [rsistence.entity.JobEntity.insertMessage] (pool-14-thread-2) ==>  Preparing: insert into ACT_RU_JOB ( ID_, REV_, TYPE_, LOCK_OWNER_, LOCK_EXP_TIME_, EXCLUSIVE_, EXECUTION_ID_, PROCESS_INSTANCE_ID_, PROC_DEF_ID_, DUEDATE_, RETRIES_, EXCEPTION_STACK_ID_, EXCEPTION_MSG_, HANDLER_TYPE_, HANDLER_CFG_, TENANT_ID_) values (?, 1, 'message', ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )
2015-07-20 16:25:40,216 DEBUG [rsistence.entity.JobEntity.insertMessage] (pool-14-thread-2) ==> Parameters: 110545(String), null, null, false(Boolean), 110490(String), 110490(String), [omitted]:4:97630(String), 2015-07-20 16:30:40.214(Timestamp), 3(Integer), null, null, async-continuation(String), null, [omitted](String)
2015-07-20 16:25:40,216 DEBUG [rsistence.entity.JobEntity.insertMessage] (pool-14-thread-2) <==    Updates: 1
2015-07-20 16:25:40,216 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2) updating: ProcessInstance[110490]
2015-07-20 16:25:40,216 DEBUG [e.entity.ExecutionEntity.updateExecution] (pool-14-thread-2) ooo Using Connection [org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6@66c7f0a1]

# Still during job execution, the execution row lock time gets set to NULL here (note the "null" 3rd-to-last parameter to the following update):

2015-07-20 16:25:40,216 DEBUG [e.entity.ExecutionEntity.updateExecution] (pool-14-thread-2) ==>  Preparing: update ACT_RU_EXECUTION set REV_ = ?, BUSINESS_KEY_ = ?, PROC_DEF_ID_ = ?, ACT_ID_ = ?, IS_ACTIVE_ = ?, IS_CONCURRENT_ = ?, IS_SCOPE_ = ?, IS_EVENT_SCOPE_ = ?, PARENT_ID_ = ?, SUPER_EXEC_ = ?, SUSPENSION_STATE_ = ?, CACHED_ENT_STATE_ = ?, NAME_ = ?, LOCK_TIME_ = ? where ID_ = ? and REV_ = ?
2015-07-20 16:25:40,216 DEBUG [e.entity.ExecutionEntity.updateExecution] (pool-14-thread-2) ==> Parameters: 6(Integer), [omitted](String), [omitted]:4:97630(String), [omitted](String), true(Boolean), false(Boolean), true(Boolean), false(Boolean), null, null, 1(Integer), 4(Integer), null, null, 110490(String), 5(Integer)
2015-07-20 16:25:40,217 DEBUG [e.entity.ExecutionEntity.updateExecution] (pool-14-thread-2) <==    Updates: 1
2015-07-20 16:25:40,217 DEBUG [org.activiti.engine.impl.db.DbSqlSession] (pool-14-thread-2) executing: delete JobEntity [id=110543]
2015-07-20 16:25:40,217 DEBUG [rsistence.entity.JobEntity.deleteMessage] (pool-14-thread-2) ooo Using Connection [org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6@66c7f0a1]
2015-07-20 16:25:40,217 DEBUG [rsistence.entity.JobEntity.deleteMessage] (pool-14-thread-2) ==>  Preparing: delete from ACT_RU_JOB where ID_ = ? and REV_ = ?
2015-07-20 16:25:40,217 DEBUG [rsistence.entity.JobEntity.deleteMessage] (pool-14-thread-2) ==> Parameters: 110543(String), 1(Integer)
2015-07-20 16:25:40,217 DEBUG [rsistence.entity.JobEntity.deleteMessage] (pool-14-thread-2) <==    Updates: 1
2015-07-20 16:25:40,221 DEBUG [pl.jobexecutor.AsyncJobAddedNotification] (pool-14-thread-2) notifying job executor of new job
2015-07-20 16:25:40,232 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2) — ExecuteAsyncJobCmd finished ——————————————————–
2015-07-20 16:25:40,232 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2)
2015-07-20 16:25:40,232 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2)
2015-07-20 16:25:40,232 DEBUG [i.engine.impl.interceptor.LogInterceptor] (pool-14-thread-2) — starting UnlockExclusiveJobCmd ——————————————————–
2015-07-20 16:25:40,232 DEBUG [pl.interceptor.JtaTransactionInterceptor] (pool-14-thread-2) Running command with propagation REQUIRED
2015-07-20 16:25:40,232 DEBUG [ti.engine.impl.cmd.UnlockExclusiveJobCmd] (pool-14-thread-2) Unlocking exclusive job 110543
2015-07-20 16:25:40,233 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ooo Using Connection [org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6@4f4f0d49]
2015-07-20 16:25:40,233 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ==>  Preparing: select * from ACT_RU_EXECUTION where ID_ = ?
2015-07-20 16:25:40,233 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) ==> Parameters: 110490(String)

# The execution row is already unlocked:

2015-07-20 16:25:40,234 TRACE [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) <==    Columns: ID_, REV_, PROC_INST_ID_, BUSINESS_KEY_, PARENT_ID_, PROC_DEF_ID_, SUPER_EXEC_, ACT_ID_, IS_ACTIVE_, IS_CONCURRENT_, IS_SCOPE_, IS_EVENT_SCOPE_, SUSPENSION_STATE_, CACHED_ENT_STATE_, TENANT_ID_, NAME_, LOCK_TIME_
2015-07-20 16:25:40,234 TRACE [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) <==        Row: 110490, 6, 110490, [omitted], null, [omitted]:4:97630, null, [omitted], 1, 0, 1, 0, 1, 4, [omitted], null, null
2015-07-20 16:25:40,234 DEBUG [e.entity.ExecutionEntity.selectExecution] (pool-14-thread-2) <==      Total: 1
2015-07-20 16:25:40,234 DEBUG [utionEntity.clearProcessInstanceLockTime] (pool-14-thread-2) ooo Using Connection [org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6@4f4f0d49]

# But we try to unlock it anyway:

2015-07-20 16:25:40,234 DEBUG [utionEntity.clearProcessInstanceLockTime] (pool-14-thread-2) ==>  Preparing: update ACT_RU_EXECUTION set LOCK_TIME_ = null where ID_ = ?
2015-07-20 16:25:40,234 DEBUG [utionEntity.clearProcessInstanceLockTime] (pool-14-thread-2) ==> Parameters: 110490(String)

# And the deadlock happens (because of the concurrent ACT_RU_EXECUTION update from job 110545 on the other thread):

2015-07-20 16:25:40,247 DEBUG [i.engine.impl.interceptor.CommandContext] (pool-14-thread-2) Error while closing command context
org.apache.ibatis.exceptions.PersistenceException:
### Error updating database.  Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may involve org.activiti.engine.impl.persistence.entity.ExecutionEntity.clearProcessInstanceLockTime-Inline
### The error occurred while setting parameters
### SQL: update ACT_RU_EXECUTION      set       LOCK_TIME_ = null     where ID_ = ?
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
[…]



So even though the unlocking is rolled back, it apparently doesn't matter because the ACT_RU_EXECUTION has already been (inadvertantly?) unlocked.  Is that a bug?

There don't seem to be any functional problems caused by the deadlock, but 1. the stack traces clutter our logs, 2. our logging automatically emails any logged errors for us to look at and fix (so we have to go to extra steps to filter these out), and 3. the deadlock info overwrites any previous deadlock info in "SHOW ENGINE INNODB STATUS" output so if we were to have other unrelated deadlocks, we would lose important debugging info (unless we go to extra steps to try to capture deadlock info as soon as they happen).

All jobs are by default exclusive. I can't explain why they would not be for you. Which kind of jobs are exclusive?

I don't know how to answer this since I'm not sure what the different kinds of job are, but I will try running it in the debugger to try to figure out where the exclusive = false is coming from.  Please stay tuned…

grahamjohnson80
Champ on-the-rise
Champ on-the-rise
Okay, I figured out that the non-exclusive jobs I am getting come from the fact that https://github.com/Activiti/Activiti/blob/master/modules/activiti-engine/src/main/java/org/activiti/... does not have a call to activity.setExclusive(), and the default value for ActivityImpl.isExclusive is false.  Contrast that to https://github.com/Activiti/Activiti/blob/master/modules/activiti-engine/src/main/java/org/activiti/... which (like most other AbstractActivityBpmnParseHandler subclasses) does have a call to activity.setExclusive(!subProcess.isNotExclusive()) that propagates the Activiti.notExclusive value into the ActivityImpl instance.

Is it intentional that CallActivity jobs are non-exclusive?  Shouldn't the default value for ActivityImpl.isExclusive be true instead of false?  I don't see anything in the user guide (http://www.activiti.org/userguide/#bpmnSubprocessAndCallActivity) indicating there should be any difference between CallActivity and SubProcess regarding exclusivity.

To summarize, I am seeing three possible Activiti bugs contributing to the deadlocks:

1. AsyncJobAddedNotification is sent too soon for jobs created by exclusive jobs (i.e. before job/execution is unlocked in UnlockExclusiveJobCmd)
2. Execution seems to be unadvertantly unlocked too soon (i.e. before UnlockExclusiveJobCmd even runs)
3. CallActivity.notExclusive is not propagated to ActivitiImpl.isExclusive in CallActivityParseHandler

For #3, I am trying overriding CallActivityParseHandler with a version that includes a call to activity.setExclusive(!callActivity.isNotExclusive()).  I'm skeptical that that alone will prevent the deadlock (because of the other two issues), but I will report back on whether it does.

grahamjohnson80
Champ on-the-rise
Champ on-the-rise
As an update, we are still getting deadlocks using the modified CallActivityParseHandler as described above, which is not surprising because of the other unresolved issues.

jbarrez
Star Contributor
Star Contributor

Point 2:

> So even though the unlocking is rolled back, it apparently doesn't matter because the ACT_RU_EXECUTION has already been (inadvertantly?) > unlocked. Is that a bug?

Combined with " Still during job execution, the execution row lock time gets set to NULL here (note the "null" 3rd-to-last parameter to the following update):"

makes me think that this could be a bug that we solved recently. We saw indeed that in some very specific cases the null is written to the execution, while it shouldn't have. That was fixed in this commit: https://github.com/Activiti/Activiti/commit/806ac390a63a561759a5995ed24283391e6e8d4d#diff-685807b8b3...


Point 3:

You are correct. Fixed it here: https://github.com/Activiti/Activiti/commit/ddf24c2d1c7339c10bb6aa6eee1932e78278882b



Point 1:

This is the one I'm not sure about. So the notification is sent, and the job gets executed. The first thing that happens is a check whether it's exclusive or not. If so, the locking happens.

If there is a follow up exclusive job, the notification is sent again, but this probably fails on the locking, right.
Note that we added a delay in 5.18 when that happens: https://github.com/Activiti/Activiti/commit/806ac390a63a561759a5995ed24283391e6e8d4d#diff-ca23220dcc.... In previous versions it would keep hitting the db, which might have helped causing the deadlock.


Thanks for the excellent discussion material. It was a long read, but well worth it. I do think we have many fixes already in place for the 5.18 for the stuff you're seeing.


grahamjohnson80
Champ on-the-rise
Champ on-the-rise
Hi Joram,

Thanks a lot for the response and the bugfix.  I believe I have confirmed that the deadlock does not occur in 5.18.0-SNAPSHOT, and for the first time I am seeing ActivitiOptimisticLockingException happening in the logs, along with the locking retry that wasn't happening before.  So that's a huge improvement.

It would be very helpful to know when 5.18.0 might be released though.  We are ready to go to production with Activiti now, but we need a version including these two fixes.  So I have three options, none of which are ideal:

1. Run full 5.18.0-SNAPSHOT (not knowing what else is in 5.18.0 that may not be production ready yet or whether there will be any schema changes between now and 5.18.0 release that wouldn't get applied because we already updated schema.version to 5.18.0)
2. Cherry-pick these two commits on top of activiti-engine 5.17.0 (but I'm not sure what other "many fixes already in place" I would be missing that would make this less usable than 5.18.0-SNAPSHOT)
3. Wait (how long?) until 5.18.0 is released

Any suggestions or hints at an expected release date?

For now I think we're going ahead with option 2 since these two fixes resolve the only issue we have seen so far.

Thanks again…