Hyland Connect

hepcibha · ‎04-08-2015

Hi,

In my application, for some use cases, we have to terminate currently running workflow instances and create a new one with new runtime variables. To implement this, as suggested in the activiti forums, we used runtimeService.deleteProcessInstance to terminate the currently running workflows and sued runtimeService.createProcess to create a new process instance.

Very often when this usecase executes, we see ActivitiOptimisticLockingExceptions in the logs. This impacts the functionality - the old workflows that are supposed to be terminated are still executing.

org.activiti.engine.ActivitiOptimisticLockingException: JobEntity [id=56c7ce96-de12-11e4-aac1-000c2995d32a] was updated by another transaction concurrently
org.activiti.engine.ActivitiOptimisticLockingException: JobEntity [id=56c7ce96-de12-11e4-aac1-000c2995d32a] was updated by another transaction concurrently
   at org.activiti.engine.impl.db.DbSqlSession$CheckedDeleteOperation.execute(DbSqlSession.java:229)
   at org.activiti.engine.impl.db.DbSqlSession.flushDeletes(DbSqlSession.java:575)
   at org.activiti.engine.impl.db.DbSqlSession.flush(DbSqlSession.java:443)
   at org.activiti.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:169)
   at org.activiti.engine.impl.interceptor.CommandContext.close(CommandContext.java:116)
   at org.activiti.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:70)
   at org.activiti.spring.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:42)
   at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
   at org.activiti.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:40)
   at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
   at org.activiti.engine.impl.RuntimeServiceImpl.deleteProcessInstance(RuntimeServiceImpl.java:87)

I understnad that this exception is thrown when one or more threads are trying to concurrently trying to update the same process. Meaning one thread could be continuing to execute the workflow while the other is trying to terminate it.

As per the use case, we need the actively runnign process to be terminated or brought to a halt and then deleted. We tried to call suspendProcessInstanceById prior to deleteProcessInstance with a hope that suspend will bring the currently running workflow to a halt and then delete will be successful after that. But that did not help.

This is happening in production and we have to apply a hotfix asap so stop the mess it is creating - so we put a hack in place and are trying to test it out before we can drop it in prod. I just wanted to check with the experts here to see if there is a better way to solve this issue.

Our work around (hack) is - to catch the optimistic locking exception and spawn a separate thread that will keep retrying for 10 times or until the process is deleted which ever comes early.

private void cleanUpAllWorkflows(List<String> workflowInstances, String deleteReason)
    {
        for (String workflowId : workflowInstances)
        {
            try
            {
               runtimeService.suspendProcessInstanceById(workflowId);
               runtimeService.deleteProcessInstance(workflowId, deleteReason);
            }
            catch (ActivitiOptimisticLockingException e)
            {
                this.diagnosticMethodSignature.setMethodName("cleanUpAllWorkflows");
                this.customLogger.logMessage(diagnosticMethodSignature, DiagnosticType.WARNING,
                      "error on cleanUpAllWorkflows, starting retry thread", e);

                // initial delete failed so retry
               DeleteExecutor thread = new DeleteExecutor(workflowId, deleteReason, runtimeService);
               taskExecutor.execute(thread);
           }
        }
    }

In deleteExecutor, teh execute method looks like this

public void run() {

boolean deleted = false;

for (int k=0;k<10;k++) {
   if (!deleted) {
      try   {


         wait(1000);
         runtimeService.deleteProcessInstance(workflowID, deleteReason);
         deleted = true;
         k = 10;
      } catch (ActivitiOptimisticLockingException e) {
         customLogger.logMessage(this.diagnosticMethodSignature, DiagnosticType.WARNING,"error on delete workflows, retry count : " + k);

      } catch (InterruptedException i) {

      }
   }
}Please let me know if there is a better way to achieve this using activiti engine apis, instead of writing our own custom code.

Thanks for your help!
Hepci

martin_grofcik · ‎04-09-2015

Hi,

I see that you know why the exception occurs. Do you know which thread updates process instance concurrently? (can you change process design to avoid it?)

Another possibility is to use TerminateEndEvent (support is experimental because of bug in called activity (when process instance is terminated in the called activiti it should terminate the parent process and all siblings too - there will be patch for that.) Have a look on
org.activiti.engine.test.bpmn.event.end.TerminateEndEventTest in activiti source code.

Regards
Martin

hepcibha · ‎04-09-2015

Thanks for you quick response, Martin!

Will take a look at 'TerminateEndEvent' and run some tests with it to see if that serves our purpose.

Couple of questions regarding the patch for the known bug you mentioned above -

1. Do know tentatively when will the patch be available?
2. We are on v5.13, do you think the patch will be backwards compitible with it? or will we have to upgrade to the latest version?

Thanks again!

jbarrez · ‎04-14-2015

1. Martin has created a pull request when it is merged: https://github.com/Activiti/Activiti/commit/21a8cca95b8858d7737811e267426746423bf386

2. Not sure, 5.13 is quite old … a ton of things have changed since then

hepcibha · ‎05-14-2015

Didn't get a chance to post my test outcome, was busy.

We tried TerminateEndEvent in both 5.13 and 5.17, and it seemed to have worked fine - it terminated all the processes and also child processes (processes called from within a process using call activity or subprocesses). We did not see any children being unterminated. Thanks for pointing me to this feature.

What I didn't mention in my original post on this topic is that we were not only seeing ActivitiOptimisticLocking exceptions in application logs when trying to delete/terminate processes but we were also seeing some oracle deadlocks on ACT_HI_Detail, ACT_RU_JOb, ACT_RU_Execution tables in oracle trace files.

After upgrading from 5.13 to 5.17 (as this version supported asynchronous executors) the number of deadlocks reduced from 30 to 3 in a 1 hour test. I was hoping that after I make workflow changes to use TerminateEndEvent, we'll not see any more deadlocks but unfortunately, we still saw 3 deadlocks ( with 5.17 amd TerminateEndEvent changes).

My application and embeded acitivit engine is deployed in clustered mode, with 2 nodes. So my guess here is that the deadlocks may be occuring when one node is executing the process and another node tries to delete/terminate it. I think when execution and termination is done by the same node then it may be working fine.

Do you think that is a possibility? If so, is deleteProcessInstance or TerminateEndEvent capable of deleting/terminating a process that is running in another node (different JVM)?

I know this looks/sounds like altogether new topic but actually they are linked (optimistic locking and deadlocks are hapenning hand in hand in my env) and hence posting it in this thread.

Thanks!!

jbarrez · ‎05-18-2015

> So my guess here is that the deadlocks may be occuring when one node is executing the process and another node tries to
> delete/terminate it.

No, one node should get a rollback.

An optimistic lock is to be expected, but a deadlock is bad. What db are you using?

hepcibha · ‎05-18-2015

In the application logs we are seeing ActivitiOptimisticLockingException but in oracle trace files we are seeing a bunch of deadlocks.

We are using Oracle.

jbarrez · ‎05-26-2015

Any more info on the deadlocks? Can you find out which query / table / … is involved when it happens?

hepcibha · ‎05-26-2015

Here are some extracts from the oracle trace files.

*** 2015-04-24 14:27:20.873
DEADLOCK DETECTED ( ORA-00060 )

[Transaction Deadlock]

The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

Deadlock graph:
                       ———Blocker(s)——– ———Waiter(s)———
Resource Name          process session holds waits process session holds waits
TX-0023001c-0001e86b        48       7     X             70    1165           S
TX-003e0014-0000ca7b        70    1165     X             48       7           S

session 7: DID 0001-0030-000E4703 session 1165: DID 0001-0046-00142062
session 1165: DID 0001-0046-00142062 session 7: DID 0001-0030-000E4703

Rows waited on:
Session 7: no row
Session 1165: obj - rowid = 000250D9 - AAAlDZAAIACLptaAAA
(dictionary objn - 151769, file - 8, block - 36608858, slot - 0)

—– Information for the OTHER waiting sessions —–
Session 1165:
sid: 1165 ser: 8572 audsid: 3447395 user: 114/CACPPROD flags: 0x45
pid: 70 O/S info: user: oracle, term: UNKNOWN, ospid: 32112754
    image: oracle@sez00dlg-718
client details:
    O/S info: user: root, term: unknown, ospid: 1234
    machine: sef00ivm005 program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
current SQL:
insert into ACT_RU_EXECUTION (ID_, REV_, PROC_INST_ID_, BUSINESS_KEY_, PROC_DEF_ID_, ACT_ID_, IS_ACTIVE_, IS_CONCURRENT_, IS_SCOPE_,IS_EVENT_SCOPE_, PARENT_ID_, SUPER_EXEC_, SUSPENSION_STATE_, CACHED_ENT_STATE_, TENANT_ID_, NAME_)
    values (
      :1,
      1,
      :2,
      :3,
      :4,
      :5,
      :6,
      :7,
      :8,
      :9,
      :10,
      :11,
      :12,
      :13,
      :14,
      :15
    )

—– End of information for the OTHER waiting sessions —–

Information for THIS session:

—– Current SQL Statement for this session (sql_id=5wczrbvrz7ph0) —–
delete from ACT_RU_EXECUTION where ID_ = :1 and REV_ = :2

*** 2015-04-24 15:48:19.933
DEADLOCK DETECTED ( ORA-00060 )

[Transaction Deadlock]

The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

Deadlock graph:
                       ———Blocker(s)——– ———Waiter(s)———
Resource Name          process session holds waits process session holds waits
TX-001d000f-0002c856        48       7     X             52     792           S
TX-00350010-00011478        52     792     X             48       7           X

session 7: DID 0001-0030-000EB4A8 session 792: DID 0001-0034-0B6E9F03
session 792: DID 0001-0034-0B6E9F03 session 7: DID 0001-0030-000EB4A8

Rows waited on:
Session 7: obj - rowid = 000250A1 - AAAlChAQAAC3eV7AAC
(dictionary objn - 151713, file - 1024, block - 48096635, slot - 2)
Session 792: no row

—– Information for the OTHER waiting sessions —–
Session 792:
sid: 792 ser: 9048 audsid: 3447342 user: 114/CACPPROD flags: 0x45
pid: 52 O/S info: user: oracle, term: UNKNOWN, ospid: 27984062
    image: oracle@sez00dlg-718
client details:
    O/S info: user: root, term: unknown, ospid: 1234
    machine: sef00ivm004 program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
current SQL:
delete from ACT_RU_EXECUTION where ID_ = :1 and REV_ = :2

—– End of information for the OTHER waiting sessions —–

Information for THIS session:

—– Current SQL Statement for this session (sql_id=brhfrudtj2s2n) —–
update ACT_HI_ACTINST set
      EXECUTION_ID_ = :1,
      ASSIGNEE_ = :2,
      END_TIME_ = :3,
      DURATION_ = :4
    where ID_ = :5
===================================================

jbarrez · ‎06-02-2015

It seems to be related to .. history: Are you looping over the same process element in your process definition, or visiting the same element multiple times?

Still does not explain why it happens on delete.

Hyland Connect

RuntimeService.deleteProcessInstance is causing ActivitiOptimisticLockingExceptions