RuntimeService.deleteProcessInstance is causing ActivitiOptimisticLockingExceptions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-08-2015 07:56 PM
In my application, for some use cases, we have to terminate currently running workflow instances and create a new one with new runtime variables. To implement this, as suggested in the activiti forums, we used runtimeService.deleteProcessInstance to terminate the currently running workflows and sued runtimeService.createProcess to create a new process instance.
Very often when this usecase executes, we see ActivitiOptimisticLockingExceptions in the logs. This impacts the functionality - the old workflows that are supposed to be terminated are still executing.
org.activiti.engine.ActivitiOptimisticLockingException: JobEntity [id=56c7ce96-de12-11e4-aac1-000c2995d32a] was updated by another transaction concurrently
org.activiti.engine.ActivitiOptimisticLockingException: JobEntity [id=56c7ce96-de12-11e4-aac1-000c2995d32a] was updated by another transaction concurrently
at org.activiti.engine.impl.db.DbSqlSession$CheckedDeleteOperation.execute(DbSqlSession.java:229)
at org.activiti.engine.impl.db.DbSqlSession.flushDeletes(DbSqlSession.java:575)
at org.activiti.engine.impl.db.DbSqlSession.flush(DbSqlSession.java:443)
at org.activiti.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:169)
at org.activiti.engine.impl.interceptor.CommandContext.close(CommandContext.java:116)
at org.activiti.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:70)
at org.activiti.spring.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:42)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
at org.activiti.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:40)
at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
at org.activiti.engine.impl.RuntimeServiceImpl.deleteProcessInstance(RuntimeServiceImpl.java:87)
I understnad that this exception is thrown when one or more threads are trying to concurrently trying to update the same process. Meaning one thread could be continuing to execute the workflow while the other is trying to terminate it.
As per the use case, we need the actively runnign process to be terminated or brought to a halt and then deleted. We tried to call suspendProcessInstanceById prior to deleteProcessInstance with a hope that suspend will bring the currently running workflow to a halt and then delete will be successful after that. But that did not help.
This is happening in production and we have to apply a hotfix asap so stop the mess it is creating - so we put a hack in place and are trying to test it out before we can drop it in prod. I just wanted to check with the experts here to see if there is a better way to solve this issue.
Our work around (hack) is - to catch the optimistic locking exception and spawn a separate thread that will keep retrying for 10 times or until the process is deleted which ever comes early.
private void cleanUpAllWorkflows(List<String> workflowInstances, String deleteReason)
{
for (String workflowId : workflowInstances)
{
try
{
runtimeService.suspendProcessInstanceById(workflowId);
runtimeService.deleteProcessInstance(workflowId, deleteReason);
}
catch (ActivitiOptimisticLockingException e)
{
this.diagnosticMethodSignature.setMethodName("cleanUpAllWorkflows");
this.customLogger.logMessage(diagnosticMethodSignature, DiagnosticType.WARNING,
"error on cleanUpAllWorkflows, starting retry thread", e);
// initial delete failed so retry
DeleteExecutor thread = new DeleteExecutor(workflowId, deleteReason, runtimeService);
taskExecutor.execute(thread);
}
}
}
In deleteExecutor, teh execute method looks like this
public void run() {
boolean deleted = false;
for (int k=0;k<10;k++) {
if (!deleted) {
try {
wait(1000);
runtimeService.deleteProcessInstance(workflowID, deleteReason);
deleted = true;
k = 10;
} catch (ActivitiOptimisticLockingException e) {
customLogger.logMessage(this.diagnosticMethodSignature, DiagnosticType.WARNING,"error on delete workflows, retry count : " + k);
} catch (InterruptedException i) {
}
}
}Please let me know if there is a better way to achieve this using activiti engine apis, instead of writing our own custom code.
Thanks for your help!
Hepci
- Labels:
-
Archive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-09-2015 02:11 AM
I see that you know why the exception occurs. Do you know which thread updates process instance concurrently? (can you change process design to avoid it?)
Another possibility is to use TerminateEndEvent (support is experimental because of bug in called activity (when process instance is terminated in the called activiti it should terminate the parent process and all siblings too - there will be patch for that.) Have a look on
org.activiti.engine.test.bpmn.event.end.TerminateEndEventTest
in activiti source code.Regards
Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-09-2015 06:12 PM
Will take a look at 'TerminateEndEvent' and run some tests with it to see if that serves our purpose.
Couple of questions regarding the patch for the known bug you mentioned above -
1. Do know tentatively when will the patch be available?
2. We are on v5.13, do you think the patch will be backwards compitible with it? or will we have to upgrade to the latest version?
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-14-2015 08:19 AM
2. Not sure, 5.13 is quite old … a ton of things have changed since then
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-14-2015 05:59 PM
We tried TerminateEndEvent in both 5.13 and 5.17, and it seemed to have worked fine - it terminated all the processes and also child processes (processes called from within a process using call activity or subprocesses). We did not see any children being unterminated. Thanks for pointing me to this feature.
What I didn't mention in my original post on this topic is that we were not only seeing ActivitiOptimisticLocking exceptions in application logs when trying to delete/terminate processes but we were also seeing some oracle deadlocks on ACT_HI_Detail, ACT_RU_JOb, ACT_RU_Execution tables in oracle trace files.
After upgrading from 5.13 to 5.17 (as this version supported asynchronous executors) the number of deadlocks reduced from 30 to 3 in a 1 hour test. I was hoping that after I make workflow changes to use TerminateEndEvent, we'll not see any more deadlocks but unfortunately, we still saw 3 deadlocks ( with 5.17 amd TerminateEndEvent changes).
My application and embeded acitivit engine is deployed in clustered mode, with 2 nodes. So my guess here is that the deadlocks may be occuring when one node is executing the process and another node tries to delete/terminate it. I think when execution and termination is done by the same node then it may be working fine.
Do you think that is a possibility? If so, is deleteProcessInstance or TerminateEndEvent capable of deleting/terminating a process that is running in another node (different JVM)?
I know this looks/sounds like altogether new topic but actually they are linked (optimistic locking and deadlocks are hapenning hand in hand in my env) and hence posting it in this thread.
Thanks!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-18-2015 11:47 AM
> delete/terminate it.
No, one node should get a rollback.
An optimistic lock is to be expected, but a deadlock is bad. What db are you using?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-18-2015 11:50 AM
We are using Oracle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-26-2015 04:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-26-2015 02:10 PM
*** 2015-04-24 14:27:20.873
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
———Blocker(s)——– ———Waiter(s)———
Resource Name process session holds waits process session holds waits
TX-0023001c-0001e86b 48 7 X 70 1165 S
TX-003e0014-0000ca7b 70 1165 X 48 7 S
session 7: DID 0001-0030-000E4703 session 1165: DID 0001-0046-00142062
session 1165: DID 0001-0046-00142062 session 7: DID 0001-0030-000E4703
Rows waited on:
Session 7: no row
Session 1165: obj - rowid = 000250D9 - AAAlDZAAIACLptaAAA
(dictionary objn - 151769, file - 8, block - 36608858, slot - 0)
—– Information for the OTHER waiting sessions —–
Session 1165:
sid: 1165 ser: 8572 audsid: 3447395 user: 114/CACPPROD flags: 0x45
pid: 70 O/S info: user: oracle, term: UNKNOWN, ospid: 32112754
image: oracle@sez00dlg-718
client details:
O/S info: user: root, term: unknown, ospid: 1234
machine: sef00ivm005 program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
current SQL:
insert into ACT_RU_EXECUTION (ID_, REV_, PROC_INST_ID_, BUSINESS_KEY_, PROC_DEF_ID_, ACT_ID_, IS_ACTIVE_, IS_CONCURRENT_, IS_SCOPE_,IS_EVENT_SCOPE_, PARENT_ID_, SUPER_EXEC_, SUSPENSION_STATE_, CACHED_ENT_STATE_, TENANT_ID_, NAME_)
values (
:1,
1,
:2,
:3,
:4,
:5,
:6,
:7,
:8,
:9,
:10,
:11,
:12,
:13,
:14,
:15
)
—– End of information for the OTHER waiting sessions —–
Information for THIS session:
—– Current SQL Statement for this session (sql_id=5wczrbvrz7ph0) —–
delete from ACT_RU_EXECUTION where ID_ = :1 and REV_ = :2
*** 2015-04-24 15:48:19.933
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
———Blocker(s)——– ———Waiter(s)———
Resource Name process session holds waits process session holds waits
TX-001d000f-0002c856 48 7 X 52 792 S
TX-00350010-00011478 52 792 X 48 7 X
session 7: DID 0001-0030-000EB4A8 session 792: DID 0001-0034-0B6E9F03
session 792: DID 0001-0034-0B6E9F03 session 7: DID 0001-0030-000EB4A8
Rows waited on:
Session 7: obj - rowid = 000250A1 - AAAlChAQAAC3eV7AAC
(dictionary objn - 151713, file - 1024, block - 48096635, slot - 2)
Session 792: no row
—– Information for the OTHER waiting sessions —–
Session 792:
sid: 792 ser: 9048 audsid: 3447342 user: 114/CACPPROD flags: 0x45
pid: 52 O/S info: user: oracle, term: UNKNOWN, ospid: 27984062
image: oracle@sez00dlg-718
client details:
O/S info: user: root, term: unknown, ospid: 1234
machine: sef00ivm004 program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
current SQL:
delete from ACT_RU_EXECUTION where ID_ = :1 and REV_ = :2
—– End of information for the OTHER waiting sessions —–
Information for THIS session:
—– Current SQL Statement for this session (sql_id=brhfrudtj2s2n) —–
update ACT_HI_ACTINST set
EXECUTION_ID_ = :1,
ASSIGNEE_ = :2,
END_TIME_ = :3,
DURATION_ = :4
where ID_ = :5
===================================================
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎06-02-2015 03:03 PM
Still does not explain why it happens on delete.
