Hyland Connect

mcajvar · ‎06-28-2016

I have a process, which consists of 2 subprocesses. When the first subprocess finishes, it should immediately continue to the next subprocess - there are no conditions to prevent that. This has worked as desired for several hundred process instances. However, there is now one process where after reaching the end event of the first subprocess it did not finish and continue to the next subprocess. The process now does nothing. I have checked logs, there were no errors present during execution. The only difference between this and other processes was that I had set logging level to DEBUG for a time, but this should have had nothing to do with the problem.

Is there a way I could "poke" the process to continue? Also, how would I go about figuring out what went wrong? I have a log with all the queries due to the DEBUG setting, if this would help any?

martin_grofcik · ‎06-28-2016

Hi,

Is there a way I could "poke" the process to continue?

org.activiti.engine.RuntimeService#signal(java.lang.String, java.util.Map<java.lang.String,java.lang.Object>)

Also, how would I go about figuring out what went wrong? I have a log with all the queries due to the DEBUG setting, if this would help any?

I would expect some exception during the logging. You can check it.
(or create jUnit test)

Regards
Martin

mcajvar · ‎06-28-2016

Hi Martin,

There is no exception present in the logs. That is part of the reason I don't know how I would go about reproducing this as this is the first and only time this has happened.

I will try the RuntimeService.signal() method, thank you.

mcajvar · ‎07-01-2016

Hi,

so I've been digging a bit more into this issue, querying the database before and after trying out the RuntimeService.signal() method. Here are my findings so far.

Here is the image of my process: https://postimg.org/image/3yxy0l94x/

Before I did anything, this was the status in the database:

<code>select id_, act_id_, task_id_, start_time_, end_time_ from ACT_HI_ACTINST where PROC_INST_ID_ = '229134' order by START_TIME_ desc;</code>

<blockcode>236669 endevent5 22.06.16 09:51:57,271000000 22.06.16 09:51:57,271000000
236668 inclusivegateway3 22.06.16 09:51:57,252000000 22.06.16 09:51:57,271000000
230607 inclusivegateway3 20.06.16 14:40:40,493000000 22.06.16 09:51:57,264000000
230606 mailtask2 20.06.16 14:40:40,457000000 20.06.16 14:40:40,493000000
230604 scripttask1 20.06.16 14:40:40,449000000 20.06.16 14:40:40,457000000
230597 servicetask17 20.06.16 14:40:40,412000000 20.06.16 14:40:40,449000000
230594 usertask30 230595 20.06.16 14:40:40,342000000 22.06.16 09:51:57,252000000
230592 inclusivegateway2 20.06.16 14:40:40,339000000 20.06.16 14:40:40,342000000
230591 exclusivegateway17 20.06.16 14:40:40,339000000 20.06.16 14:40:40,339000000
230587 servicetask11 20.06.16 14:40:39,032000000 20.06.16 14:40:40,339000000
230132 usertask7 230133 20.06.16 13:50:25,145000000 20.06.16 14:40:39,032000000
230130 scripttask6 20.06.16 13:50:25,135000000 20.06.16 13:50:25,145000000
230129 inclusivegateway1 20.06.16 13:50:25,132000000 20.06.16 13:50:25,135000000
230126 servicetask5 20.06.16 13:50:24,961000000 20.06.16 13:50:25,132000000
230123 servicetask15 20.06.16 13:50:24,960000000 20.06.16 13:50:24,961000000
230122 exclusivegateway27 20.06.16 13:50:24,960000000 20.06.16 13:50:24,960000000
230118 exclusivegateway1 20.06.16 13:50:24,945000000 20.06.16 13:50:24,945000000
230119 servicetask14 20.06.16 13:50:24,945000000 20.06.16 13:50:24,960000000
229621 usertask10 229633 20.06.16 13:37:04,091000000 20.06.16 13:50:24,919000000
229616 scripttask11 20.06.16 13:37:04,071000000 20.06.16 13:37:04,071000000
229617 scripttask3 20.06.16 13:37:04,071000000 20.06.16 13:37:04,084000000
229612 exclusivegateway15 20.06.16 13:37:04,014000000 20.06.16 13:37:04,014000000
229611 exclusivegateway26 20.06.16 13:37:04,014000000 20.06.16 13:37:04,014000000
229613 servicetask4 20.06.16 13:37:04,014000000 20.06.16 13:37:04,071000000
229558 usertask29 229570 20.06.16 13:36:39,952000000 20.06.16 13:37:04,012000000
229556 scripttask10 20.06.16 13:36:39,950000000 20.06.16 13:36:39,951000000
229555 exclusivegateway24 20.06.16 13:36:39,925000000 20.06.16 13:36:39,950000000
229554 exclusivegateway23 20.06.16 13:36:39,924000000 20.06.16 13:36:39,924000000
229553 exclusivegateway25 20.06.16 13:36:39,924000000 20.06.16 13:36:39,924000000
229229 usertask3 229230 20.06.16 13:11:14,444000000 20.06.16 13:36:39,924000000
229227 scripttask2 20.06.16 13:11:14,436000000 20.06.16 13:11:14,444000000
229226 exclusivegateway22 20.06.16 13:11:14,435000000 20.06.16 13:11:14,435000000
229215 servicetask8 20.06.16 13:11:14,408000000 20.06.16 13:11:14,435000000
229213 travelOrderRequestSubprocess 20.06.16 13:11:14,407000000 20.06.16 14:40:40,412000000
229214 startevent1 20.06.16 13:11:14,407000000 20.06.16 13:11:14,408000000
229136 msgStartEvent 20.06.16 13:11:14,406000000 20.06.16 13:11:14,407000000</blockcode>

So the end event has been reached, but there is no record of the finished subprocess and continuation to the next one. Next I checked the executions:

<code>select id_, parent_id_, act_id_, is_active_ from ACT_RU_EXECUTION where PROC_INST_ID_ = '229134';</code>

<blockcode>229134 0
229212 229134 0
229557 229212 usertask29 0</blockcode>

This tells me that the execution of the multi instance user task (usertask29, see above query output and diagram) is still present. But since the process had continued past that point in the past, it should already have been ended, but for some reason it was not.

What I would like to achieve is to just "get rid" of the execution, so that the process would continue to the next subprocess. Next up I tried using RuntimeService.signal("execution id") - I used the id 229557, of the execution with the usertask29 activity id.

However, this had an undesired effect. The process now behaves as if continuing from the affected multi instance user task. This means that the users would get the tasks again and any processes which communicate with external applications get to do that again, which should not happen. If I check the ACT_HI_ACTINST table again, the content is the same, with these EXTRA records:

<blockcode>275013 usertask10 275025 01.07.16 15:53:45,887000000
275007 scripttask3 01.07.16 15:53:45,122000000 01.07.16 15:53:45,887000000
275006 scripttask11 01.07.16 15:53:45,111000000 01.07.16 15:53:45,122000000
275004 exclusivegateway15 01.07.16 15:53:45,098000000 01.07.16 15:53:45,098000000
275003 exclusivegateway26 01.07.16 15:53:45,098000000 01.07.16 15:53:45,098000000
275005 servicetask4 01.07.16 15:53:45,098000000 01.07.16 15:53:45,111000000</blockcode>

I cannot figure out why and when this bug triggers, but it appears to me that under some circumstances the multi instance user execution is not ended and remains active.

Thinking back to an earlier problem I had I am guessing that the same problem appeared there - the executions were not ended and that might have caused the foreign key violation?

I should also add that until the start of this month we were using Activiti 5.16.1 and have now upgraded to 5.20.0.2. Before this upgrade, we've had no such problems in production. All instances where this problem occurs have appeared after the upgrade. Since my first post in this thread 2 more processes have been reported to me by users and querying the database shows yet more. I am currently sitting at 15. Not all processes get stuck, but most do.

I am sorry for not being able to provide a unit test. If I knew how to reproduce the problem with certainty, I probably would, but since I don't… I would however gladly provide any other information to help track this issue.

So now my question is - until the issue is resolved, what can I do to get rid of the stale executions and have the processes continue WITHOUT repeating most of the subprocess?

mcajvar · ‎07-25-2016

Hi,

I am still wondering if there is a way to have these processes who seem to be hanging in the air moved forward. Could I do some database magic and send a signal to the subprocess execution to have it finish itself and the process move on to the next subprocess? Or at least if I send a signal to the process is there a way I could have Activiti skip all tasks (because they have already been performed and should not be performed again) and reach the end of the subprocess that way?

Any help would be greatly appreciated, as these processes are steadily beginning to pile up.

jbarrez · ‎07-27-2016

> until the issue is resolved, what can I do to get rid of the stale executions and have the processes continue WITHOUT repeating most of the subprocess?

You give a lot of details … which is good but also makes it very hard to reproduce.
Looking at the process and database, I can't immediately say 'this is wrong'.

There is the 'skipExpression in Activiti which allows you to skip steps in a process (not documented yet, but test here https://github.com/Activiti/Activiti/blob/master/modules/activiti-engine/src/test/java/org/activiti/...).

So am I getting it right that you have a multi instance that doesn't finish properly, but it continues in the process?
It would be really helpful to know why this is happening in some cases … it must be variable related, if all the rest is the same?

mcajvar · ‎07-27-2016

Hi,

I have tried putting the exact same process into a test case (using both H2 and Oracle) and setting some variables during the test, but I have been unable to reproduce the issue in the test. I have not yet played around with the variables, this is something I still plan to do though.

Thank you for the expression skipping information, I will give it a try.

So am I getting it right that you have a multi instance that doesn't finish properly, but it continues in the process?
It would be really helpful to know why this is happening in some cases … it must be variable related, if all the rest is the same?

Yes, that is what I suspect is happening. The multi instance user task behaves as if it had finished, the process continues, but the execution record remains in the database, thus preventing the parent execution to continue.

If it would help, I can set our application a more detailed log level and try to filter out the relevant entries and attach them here. I didn't do that yet because I have no idea what to look out for, or what I should expect in the log files.

Or if you have some other suggestions I could follow to help provide more details about this case, I'd be happy to do so.

jbarrez · ‎07-28-2016

> I didn't do that yet because I have no idea what to look out for, or what I should expect in the log files.

What is always interesting (to us, as engine coders) is adding

log4j.logger.org.apache.ibatis.level=DEBUG

This will give you the raw sql that goes to the database.

Anything in org.activiti is interesting to see anyway.

matej1 · ‎08-26-2016

Heylo, McAjvar's gone seeking greener pastures and I'm his successor.

I've narrowed the issue down to multi instance user task handling. Our usertask29 does a peculiar thing, its activiti:collection="${expr}" expression changes values after each complete, which might be incorrect, although the docs don't make this clear. I'm going to assume it's illegal use on our part, but it definitely worked in version 5.16. We're on 5.20 now.

Looking at this commit, most of the method is skipped when the list length hits 0, which is what probably causes our issue. I can't really revert this one to test, because there were other changes in between and execution.getActivity() is now also null in this case, so there's an exception. However, our process starts working if I make sure the list doesn't change.

The reason we "need" a mutating list is because we have a list of entries that different users need to confirm, but we don't want them getting a task if their entries are already confirmed, and they get confirmed after they complete the task, so the list changes.

Can you recommend a nice pattern or solution for this? I can only come up with an extra service task that sets a list variable right before the task. Or maybe this is actually a regression, and we're allowed to mutate the list?

Thanks, and sorry for the delay!

trademak · ‎08-29-2016

The multi instance construct evaluates the number of instances when creating the multi instance activity. So if you use a mutating list, it will still use the original values that were evaluated on create time. This can result in unexpected behavior.

Best regards,

Hyland Connect

A process did not continue from one subprocess to another