Hyland Connect

activiti-admin · ‎06-06-2010

Guys,

I love what you have done here so far. I have a few questions that are mostly necessitated by using a NoSQL backend (Mongo) rather then a traditional SQL repository. I have a number of business processes I want to implement and support and BPMN 2.0 seems a natural way to do so. That means that I basically seem to have three integration options:

1) Use the POJO interface to dynamically create my activities based on (bpmn 2.0?) documents in my document store.
a) Is it possible to keep track of state, and reinitialize a activity, and tell the ObjectProcessInstance that a given process instance should be started somewhere other then the start node?
b) Is it possible to automatically create a ProcessDefinition from bpmn 2.0 xml without using the deployment scenario? Obviously, If I am using a NoSQL database, I would prefer to simply grab a document and build it rather then the SQL Deployment.

2) Use BPMN as a stand alone component.
a) Doable, but ugly since I have to do some translation between a document and map, and merge the changes back into the document when the process finished? Keep a seperate reference to the document?

3) Port Activiti on top of MongoDB
a) Typically applications architected for SQL data stores are not a good fit for document oriented stores?
b) I would love to take a whack at this some time, but not on the timeline needed for this project, and not with it so in flux?

tombaeyens · ‎06-09-2010

Cool! Really interesting for us.

We do target to ensure that our architecture allows for NoSQL databases as well. Thought that work needs to start and your input is very welcome.

Here's our ideas on the topic: The PersistenceSession should be refactored (if necessary) so that e.g. a MongoDbPersistenceSession can be implemented.    The first step that we should do is to implement a JSON serialization of the runtime process instance data structures. There is some work started on that in package org.activiti.json Then the mongo db persistence session can serialize a whole process instance as a text string.

To what extend can Mongo DB give concurrency control? Currently we delegate concurrency control to the DB as well. All runtime updates are guarded with optimistic locking. That way processes executed on Activiti can be seen as specifying transactional control flow.    The process executes in discrete steps (=transactions) form one state to the another.

So we need to figure out how we want to deal with situations like this:
When a job needs to be executed, we currently use optimistic locking to prevent that multiple job executors start executing the same job.
Another example: if two operations occur on different nodes in the cloud, (potentially in concurrent paths of execution), how would that conflict be resolved when those updates meet each other in the cloud.

I don't know what options we have in that respect on MongoDB.

We also have been thinking about an in-memory locking server/cluster. The function would be to obtain a global lock on a certain process instance. Because all locked process instances are kept in memory, it could make sure that it's fast and still scales. Making that a cluster of e.g. 3 nodes could provide failover. All commands in the Activiti API that operate on a process instance could obtain a lock for the duration of the command.   This could be combined with storing the JSON serialized process instances in a big-table like cloud persistence solution. That way users would still get transactional control flow semantics from Activiti.

WDYT?

joshuap · ‎06-09-2010

Here's our ideas on the topic: The PersistenceSession should be refactored (if necessary) so that e.g. a MongoDbPersistenceSession can be implemented. The first step that we should do is to implement a JSON serialization of the runtime process instance data structures. There is some work started on that in package org.activiti.json Then the mongo db persistence session can serialize a whole process instance as a text string.

I like the thinking here. In general, with NoSQL, you want single complex objects rather then normalized single tables. I do think most of your thinking here is targeted around CouchDB rather then MongoDB. CouchDB uses JSON, Mongo uses Java annotations and simple objects to communicate (although it also can serialize objects to JSON, but de-serializing is more dicey).

I think both should be supported, but will also be honest and say that the performance of CouchDB is extremely poor, and may not be a good choice for this technology.

To what extend can Mongo DB give concurrency control? Currently we delegate concurrency control to the DB as well. All runtime updates are guarded with optimistic locking. That way processes executed on Activiti can be seen as specifying transactional control flow. The process executes in discrete steps (=transactions) form one state to the another.

In a single node configuration, concurrency control is fairly solid if you think about it before. You can also use optimistic locking on fields, and there are a ton of atomic options you can use to append data/upsert data. However, once you start to scale out, the concept of eventual consistency is the mortal enemy of strict SQL style locking. Mongo only supports master-slave replica's so there isn't a issue where you could overwrite by accident, but it's always possible that a replicated slave could read a incomplete (not updated yet) object. The newest version of Mongo permits you to block on a write until it's propagated out to the boonies.

The problem is a bit different with Couch since couch uses MVCC, and all nodes can be master or slave.

In general, I think a approach of appending the work flow may work better then changing state on a object. Basically you would have something like:

{"Session":"SessionKey",
"Flow":"requestNewFeature"
" Roles": ["user":"Josh", "Role"requestFeature", "activitiesCompleted" : [{"start":infoOnStart},{"ThinkAboutANewFeature":"NoSQL rocks"}, {"registerAccountOnForum":"account:JoshuaP, post ="Let's do cool new things"}, {"postOnBoard":"think about mongo"},
               "user":"tombaeyens", "Role"CreatorOfActiviti", "activitiesCompleted" : [{"postOnBoard":"whatAboutLocking?"}]
    ]

}
In this scenario, you would need a globally meaningful session key that would encapsulate all information regarding a particular session. It might be as simple as "user-process" but it might need to be more complex depending on what scenario's you want to use.

Why use this type of structure?
1) Get everything we need to know about a session in a single I/O call.
2) Can append to nested structures (for example, Session.Roles.Josh.activitiesCompleted[]) dynamically.
3) Append only is lock friendly.

So we need to figure out how we want to deal with situations like this:
When a job needs to be executed, we currently use optimistic locking to prevent that multiple job executors start executing the same job.
Another example: if two operations occur on different nodes in the cloud, (potentially in concurrent paths of execution), how would that conflict be resolved when those updates meet each other in the cloud.

With Mongo, you only have a single master. You can do a check to make sure that a similar process doesn't exit prior to running:

db.sessions.update({sessionKey: 15, {$not:{$exists:{status:1}}}}, { $set : {status : "start" } }
Again, you can also do a append to a existing object, which I think is a bit cleaner.

We also have been thinking about an in-memory locking server/cluster. The function would be to obtain a global lock on a certain process instance. Because all locked process instances are kept in memory, it could make sure that it's fast and still scales. Making that a cluster of e.g. 3 nodes could provide failover. All commands in the Activiti API that operate on a process instance could obtain a lock for the duration of the command.   This could be combined with storing the JSON serialized process instances in a big-table like cloud persistence solution. That way users would still get transactional control flow semantics from Activiti.
WDYT?

Yeah, we took this approach on a very large scale (10k TPS+) system that we built. We didn't actually hard locked, but instead shared a sessionKey structure that was locked that prevented us from making changes to the same key at the same time. I would pull a Knuth here and say that locks are a bit symptomatic about of a failure to figure out a good algorithm.

ronaldbrinkerin · ‎09-28-2011

@ Joshua,

we are looking at implementing Activiti with a MongoDB backend. Did you get any further in this regard since last year?

@Tom Baeyens,

Is there any statement of direction since last year from your point of view?

Greatly appreciated,
Ronald

trademak · ‎09-28-2011

Hi,

I'm not aware of any work being done in this area recently.
We don't have plans to support Activiti on a MongoDB backend.
Within Alfresco we're doing stuff with MongoDB and we like it a lot.

Best regards,

Hyland Connect

BPMN 2.0, NoSQL & POJO questions