Hyland Connect

fwachs · ‎05-14-2013

I have a very simple problem and I'm not sure why it's happening at all.
Maybe anyone can enlighten me somehow

This is what's going on:

- I'm running activiti 5.11
- I have configured Activiti to run with 1 retry only and changed JobExecutor.lockTimeInMillis to 5 hours
- I have a process that makes a "call activity" to another process that as soon as it begins it has a timer for one minute:
- I'm running on mysql


    <intermediateCatchEvent id="timerintermediatecatchevent3" name="TimerCatchEvent">
      <timerEventDefinition>
        <timeDuration>PT1M</timeDuration>
      </timerEventDefinition>
    </intermediateCatchEvent>
‍‍‍‍‍‍‍

When this sub process gets started it takes a really really long time to execute the rest of the process, way longer than the minute I have configured on the bpmn file.

Here's an example of how it looks like on my DB:


ID_   PROC_DEF_ID_   PROC_INST_ID_   EXECUTION_ID_   ACT_ID_   TASK_ID_   CALL_PROC_INST_ID_   ACT_NAME_   ACT_TYPE_   ASSIGNEE_   START_TIME_   END_TIME_   DURATION_
6262467   EnvioSMSpreguntanueva:18:6013770   6262463   6262465   timerintermediatecatchevent3   NULL   NULL   TimerCatchEvent   intermediateTimer   NULL   2013-05-11 20:58:17   2013-05-12 16:59:17   72060320
‍‍‍‍

Also, for some reason the entire process ( and the subprocess) are being retried until the process ends sucessfully, this has led to many problems, since the process is only meant to be ran once because of tasks it runs. Is there a way to revoke any kind of retrial for good?

Thanks,
Federico

fwachs · ‎05-14-2013

This way it's more legible. Sorry for posting more than once.
<code>
{
"data":
[
{
   "ID_": "BLOB",
   "PROC_DEF_ID_": "BLOB",
   "PROC_INST_ID_": "BLOB",
   "EXECUTION_ID_": "BLOB",
   "ACT_ID_": "BLOB",
   "TASK_ID_": null,
   "CALL_PROC_INST_ID_": null,
   "ACT_NAME_": "BLOB",
   "ACT_TYPE_": "BLOB",
   "ASSIGNEE_": null,
   "START_TIME_": "2013-05-11 20:58:17",
   "END_TIME_": "2013-05-12 16:59:17",
   "DURATION_": 72060320
}
]
}
</code>

jbarrez · ‎05-15-2013

Could it be that the job executor simply doesn't poll quickly enough? I believe the default polling interval is 5 minutes?

Revoking retry for good : you'll need to inject your own Job Executor class which extends from the default one and change it to do no retries.

fwachs · ‎05-15-2013

Don't you think 72060320 milisecons aka 20hours is a bit too much for a
<code>
<intermediateCatchEvent id="timerintermediatecatchevent3" name="TimerCatchEvent">
      <timerEventDefinition>
        <timeDuration>PT1M</timeDuration>
      </timerEventDefinition>
    </intermediateCatchEvent>
</code>

How can an intervall that is defaulted to 5 minutes take 20 hours (I've got cases on my DB that duration is more than 40hours)?

Thanks for your reply, I really appreciate it.

jbarrez · ‎05-21-2013

I didnt check the 20 minutes, no. So that is definitely wrong. However if you changed the lockTime to 5 hours, it could explain it right? What happens if you put the time back to the default?

fwachs · ‎05-22-2013

How would changing the lockTime to 5 hours explain it? I'm not following , sorry.

frederikherema1 · ‎05-23-2013

Seems to me that an activity after the timer-event is taking a long time to complete. SInce the JobExecutor.lockTimeInMillis is set to 5 hours, after 5 hours of executing the first run of the timer-fire, the job-executor, will think the job (which is locked by the first thread) has failed and the job is unlocked. SInce the first thread didn't come back yet, the retry-count isn't decremented and the job-executor will execute the job again (in another tread).

This carries on for 4 times, until one of the jobs succeeds, resulting in 20 hours (I guess). What does the process actually do? And what is the state of the ACT_RU_JOB entry, a couple minutes after the timer is due to fire? If it's locked, the job-excutor is working on it.

jbarrez · ‎05-23-2013

Because if there is a failure, it gets locked for 5 hours? That would explain why you see such a huge delay.

Hence my suggestion to test it with default settings, to rule out that is due to your changes to the job executor.

mmaker1234 · ‎05-23-2013

Hello Federico,

In my experience I learned that changing the default values of an external system is the last resort to cope with your own problems. Please explain what was the reason to change the value for JobExecutor.lockTimeInMillis - probably there could be another solution.

I think that the reason the whole process to be retried is … far from "some". You didn't published your process design but let me remind you that "Activiti is going to advance in the process [in a single transaction], until it reaches wait states on each active path of execution" [ Activiti User Guide, Transactions and Concurrency ], i.e. on a transaction roll-back (when an error rise during the process instance execution) Activiti engine retries the activities sequence since the last persisted process state. All this means that if you want to retry only one (the most recent in your case) or a couple of activities then you need to control the persistence state of your process instance. I would suggest you to mark (some of) your activities Asynchronous - then the process state will be persisted on the start of each "asynchronous" activity and the retry (in a case of an error) will start from the last persisted activity.

Additionally, although some workflow engines are tolerant to long transactions, it is not advisable your activities to run longer than a couple of minutes. If you need some tasks to take longer consider to make them asynchronous (this is not related to the activity property) - use one activity to invoke the task (JMS is one of your friends here) and allow the process immediately to proceed to the next activity where to wait for signal that the task was completed. You can even build a complex logic with time-outs, notifications on task result (failure in particular), and (manually) managed task repetition around such an approach.

Hope this helps,
Monique

fwachs · ‎05-25-2013

Thank you all for your replies, I truly appreciate them.
The main problem for me is that there are some jobs that for me , shouldn't be retried even if the transaction fails to be stored due to locking exceptions. For example, sending SMSs or emails, I can't have my users getting multiple emails or sms because of an optimistic lock..If it did send the email or sms then that's all I wanted and I don't really care if the revision is ok or not.. Does this make sense to you?

Concerning the lockTime, I had a process that had a subprocess that was a for-each to send 10k of emails, and the lock time was really small and since the process couldn't finish sending the emails before that lock time, it restarted…and then it restarted again because it never got to send all the emails again, and so on until I killed the app.
How could I avoid this kind of situations?

Hyland Connect

Problem with intermediate catch event and retries