Hyland Connect

pollop · ‎09-24-2013

Hi,

I am trying to understand how Activiti handles unexpected restarts.
My company is currently assessing Activiti's hability to recover from an unexpected shutdown of the system.

The context is the following, Activiti will run on a server handling lots of process instances mainly consisting of service tasks.
If something happens to the system, a second server will handle the continuation of the application.

What I am really interested in is: what does Activiti consider as a transaction?, when is the context saved in the database? etc. Any information would be helpful.

To take a more concrete example, let's say the server crashed (for reasons outside the scope of the application.
At the time of crash, hundreds of Service Tasks were being executed.

When the system restarts, those process instances do not seem to run anymore.
I looked in the database and saw that their status is Active but they are not running.

105 2 105 null null processTriple:1:103 null servicetask2 TRUE FALSE TRUE FALSE 1 4

Is it possible for the engine to restart those tasks or do I have to do it mannually?

Thanks a lot for your help.

Note: ASYNC execution is of course activated.

trademak · ‎09-24-2013

Are these service tasks configured as asynchronous?
By default Activiti resumes all process execution where it left of.
If you have a synchronous service task that's being executed at the time the server crashed, that transaction is not committed to the database. So the process engine will be in the last wait state before this service task. Is that's a user task for example, the process engine will wait until you complete the user task again.

Best regards,

lehvolk · ‎09-24-2013

In my situation I have a loop with isSequential="false" of callActivity tasks. Is there any possibility to complete all uncomplited callActivity processes and parent process?

pollop · ‎09-25-2013

Thank you for your response.
The behaviour you describe is the one I would have expected but it doesn't seem to be the case.
My test workflow is very simple : Start -> Service 1 -> Service 2 -> Service 3 -> End
I made Activiti crash during service 2.

In the database I have a row in the RU_EXECUTION table which looks like this
105 2 105 null null processTriple:1:103 null servicetask2 TRUE FALSE TRUE FALSE 1 4

But when restarting the application, nothing happens. My JavaDelegate class of Service2 is never instancied again nor is the one from Service3.
All of my service class are "async" and the jobExecutorActive is set to true.

Am I missing something here?

Thanks again for oyur help.

lehvolk · ‎09-24-2013

Thaks for this topic.

Today I faced with the same problem. During running of huge process tomcat have been restarted. And after server startup all runnig processes not finished and stay in active state.

Currently I found no information about activiti behaviour in this situations.

pollop · ‎09-25-2013

Quick update to make it easier.

Content of my ACT_RU_EXECUTION table:
<code>
ID_ 105
REV_ 2
PROC_INST_ID_ 105
BUSINESS_KEY_ null
PARENT_ID_ null
PROC_DEF_ID_ processTriple:1:103
SUPER_EXEC_ null
ACT_ID_ servicetask2
IS_ACTIVE_ TRUE
IS_CONCURRENT_ FALSE
IS_SCOPE_ TRUE
IS_EVENT_SCOPE_ FALSE
SUSPENSION_STATE_ 1
CACHED_ENT_STATE_ 4
</code>
Content of my ACT_RU_JOB:
<code>
ID_ 109
REV_ 6
TYPE_ message
LOCK_EXP_TIME_ null
LOCK_OWNER_ null
EXCLUSIVE_ TRUE
EXECUTION_ID_ 105
PROCESS_INSTANCE_ID_ 105
PROC_DEF_ID_ null
RETRIES_ 0
EXCEPTION_STACK_ID_ 111
EXCEPTION_MSG_ couldn t execute activity <serviceTask id="servicetask2" …>: Someone stepped on the wire.
DUEDATE_ null
REPEAT_ null
HANDLER_TYPE_ async-continuation null
HANDLER_CFG_ null
</code>

jbarrez · ‎09-26-2013

you can see it in your db:

RETRIES_ 0
EXCEPTION_STACK_ID_ 111
EXCEPTION_MSG_ couldn t execute activity <serviceTask id="servicetask2" …>: Someone stepped on the wire.

The job executor tried 3 times, and no retries left. The job executor won't keep trying.

It means that when the system 'crashed' the job executor actually still was running and tried for 3 times.
You can now fetch the job and execute it manually.

dan_ · ‎06-03-2016

Hi Joram,

While the engine has locked some jobs for processing, then restarting the servers or because of a crash, thos jobs will never be processed. Typically, the LOCK_EXP_TIME_ has some value, the LOCK_OWNER_ = null, DUEDATE_= null, and RETRIES_ > 0.

If I trigger them manually, that is setting LOCK_EXP_TIME_ = null and DUEDATE_ = CURRENT_TIMESTAMP, they will be picked up and consumed.

Do you have suggestion of how to overcome this issue, other than creating a scanning thread for such jobs and try to unlock them?

Thanks,

Dan

jbarrez · ‎06-20-2016

@dan_: the duedate = null not being picked up is something that is fixed in the recent 5.21 release.

Hyland Connect

System restart