Hi,
It sounds like you may be hitting issues with concurrency. Alfresco uses an optimistic locking mechanism which means that an action will run and attempt to commit but if another process has made changes on the nodes being modified by the action in the interim, it will roll back (you still might see logging messages appear in the logs from the execution even though it doesn't commit). Alfresco handles this by wrapping the transaction in a RetryingTransactionHelper which catches any concurrency related rollbacks and retries the action again.
Unfortunately, there is a limit to the number of times it will attempt to retry (20 by default I think). So what can happen, if you have a large number of processes running in parallel in separate threads (like synchronous rule executions) that are all making changes to the same node or a common set of nodes, is that you may pass the retry limit and this will cause some of the rule executions to fail altogether.
One option that may make things better, is making the rules asynchronous - this will effectively push the action executions into a queue which will then be processed at a lower level of parallelism (depending on the size of the asychronous action thread pool). Obviously some logic won't lend itself to working well asynchronously, and another thing to bear in mind is that Alfresco by default only uses a single Queue & Thread Pool for asynchronous actions across the entire system - so if your rule actions fill it up and create a large backlog then other things that happen as part of normal system usage, e.g. email notifications, invites, even thumbnail generation in some cases can be significantly delayed.
Regards
Steven