Hi barrez
Thank you for replying, my thoughts are following direction:
1. Achieving high performance by parallelizing processing of certain steps , where human participant mentions the task to be executed in parallel (by System - e.g Parallel_Business_Rules_task or sending Mails). The BPM engine takes care of processing the same task on multiple nodes of BPM cluster and joins back to the originator node on completion. Where HDFS kind of infrastructure is used to take care of file replication/consolidation across these nodes. These tasks generally work on file or DB (relation or column based).
2. Essentially the above approach the user decides which business step can run in parallel (say sending mail to 1,00,000 users) and what part of data each of these parallel braches work (say 10,000 mails per BPm node ) is decided by BPM engine based on some provided threshold. By the virtue of running BPM engine across multiple node we could possible shift execution of a task to multiple BPM engines in the cluster provided we have required data on those nodes.
3. This would mean having some kind of Job monitor and failure recovery mechanism. Wherein we can think of Activiti GRID engine (the new software piece) to take care of such tasks, we can possibly borrow from hadoop kind of platform for such capability (not sure though).
One might reason that these are batch oriented tasks and hence better kept outside tradition BPM paradigm , but personally I see lot of value if a BPM engine can model and take care of such tasks (at a very high level to start with , say what the task does can be done in Plain Java, the platform takes case of rest of the management stuff)
Thank you
Best Regards
Krishnendu