Hyland Connect

k_kunti · ‎10-09-2012

Dear All

I have been thinking implications of High performance computing in BPM specifically in the area of Business rules where data size and rule complexity might be significant. I feel the capability to run complex business rules on significant amount of data (in GB/TB) will make adoption of BPM in engine in a wide variety of enterprise application.

I would like to know your thoughts in this area and if potentially this kind of capability can be added in activiti

I have shared my thoughts at the following link:

http://kkrishnendu.blogspot.com/2012/10/high-performance-computing-for-bpm.html

Including the contents from the link:

High Performance Computing for BPM /Business Rules

In today's business environment speed and agility are key needs for a decision maker. If we look at applications within an enterprise almost all of them either supports some kind of business process (though they might not have been implemented using BPM platform ) or support core operations like ERP, SCM etc (which might be seen as combination of forecasting, optimization and complex business rules processing).

Looking at the emerging trends in high performance computing the shift is towards platform managed distributed high performance computing (HDFS-Hadoop) and column based databases (Cassandra/ HBASE etc). Hadoop essentially takes business logic execution to distributed data (instead of moving data , which might be significant in size to processing points) and manages fault tolerance/ replication in transparent manner. At the heart of Hadoop is HDFS and Map-reduce ; in short HDFS manages data files sharing in fault tolerant manner across nodes and map reduce is a programming paradigm which breaks a large data set in parallel steps into smaller , where these steps are carried out in parallel.

However looking closely , Map-reduce paradigm in its pure form is not very well suited for business process which comprises of complex set of business rules. Essentially codifying business process which contains both parallel and sequential steps where some of the steps contain complex business rules that handle significant amount of data requires business user's expertise to determine the nature of execution of these steps.The idea is to use a business users intelligence to determine:

    The steps which should be run in sequential manner, which sequential step would gain performance benefit if processed in parallel manner (using Map Reduce).
    which steps should be run in parallel manner not using Map reduce (in the same server or multiple servers).
     Which parallel steps should in terms be further broken down to parallely process data either using Map reduce or custom multi threading.

Taking the thought one step ahead, a sequential step which processes significant amount of data (say in Giga bytes and not in Tera Bytes) can be handled in the same hardware using custom multi threading logic and in- memory cache. And need not need complexity of Map reduce architecture , which introduces programming complexity and platform related latencies.

Hence the requirement for a BPM platform that could use similar paradigm as Map reduce/ HDFS along with traditional techniques of high volume data processing (like in memory data processing). One could visualize these as new set of controls (e.g Map reduce, In memory procoressing, Column DB adaptors) which can be modeled in a BPM process.

Please note, in the previous paragraph I have used Map reduce as a programming paradigm which enables parallel processing of data into smaller sets in each subsequent step, we might choose to call it something else in this context. Its my humble request to invite comments from both BPM and HPC community to share their thoughts. I will be further sharing my thoughts in subsequent posts.

jbarrez · ‎10-11-2012

What exactly are you looking for in a discussion?

I mean, of course we have thought already about map reduce and big data and how it related to bpm … but so far the use cases for BPM don't quite match those of big data. Business processes need to be strict and just in time, not 'eventual consistent'.

Maybe I'm not seeing the picture you're trying to sketch … to feel free to elaborate.

k_kunti · ‎10-15-2012

Hi barrez

Thank you for replying, my thoughts are following direction:

1. Achieving high performance by parallelizing processing of certain steps , where human participant mentions the task to be executed in parallel (by System - e.g Parallel_Business_Rules_task or sending Mails). The BPM engine takes care of processing the same task on multiple nodes of BPM cluster and joins back to the originator node on completion. Where HDFS kind of infrastructure is used to take care of file replication/consolidation across these nodes. These tasks generally work on file or DB (relation or column based).

2. Essentially the above approach the user decides which business step can run in parallel (say sending mail to 1,00,000 users) and what part of data each of these parallel braches work (say 10,000 mails per BPm node ) is decided by BPM engine based on some provided threshold. By the virtue of running BPM engine across multiple node we could possible shift execution of a task to multiple BPM engines in the cluster provided we have required data on those nodes.

3. This would mean having some kind of Job monitor and failure recovery mechanism. Wherein we can think of Activiti GRID engine (the new software piece) to take care of such tasks, we can possibly borrow from hadoop kind of platform for such capability (not sure though).

One might reason that these are batch oriented tasks and hence better kept outside tradition BPM paradigm , but personally I see lot of value if a BPM engine can model and take care of such tasks (at a very high level to start with , say what the task does can be done in Plain Java, the platform takes case of rest of the management stuff)

Thank you
Best Regards
Krishnendu

ronald_van_kuij · ‎10-15-2012

One might reason that these are batch oriented tasks and hence better kept outside tradition BPM paradigm

Yep…

but personally I see lot of value if a BPM engine can model and take care of such tasks

I don't… Since it over complicates the bpm engine I think to achieve things like this. But that the are just my €0.02

jbarrez · ‎10-16-2012

I agree with Ronald here: it shouldn't be part of the engine.
There are specialized open source frameworks to achieve this: I'm now thinking of Spring batch, Spring integration, Apache Camel, JBoss Infinispan, etc…

One thing I plan to look into in the near future is making integration with these frameworks better in Activiti. As Krishnendu says, it should be easier to fire of tasks that require heavy computing. What we need to improve there first of course, is correlation in the engine. Now we only have the execution id. But it should be possible to correlate based on process data instead of these execution ids.

Hyland Connect

HPC for BPM /Business Rules -- Support in Activiti