I am writing a simple application to run 8 executors, each with a max thread pool of 2. 
I am using Activiti 5.19.0.1.
The total number of process instances is 10k. It takes 10 to 16 minutes to complete the 10k processes on a mac pro with 16G. The database used is Postgres.
All tasks in the process are set to lazy and exclusive. 
Attached a zip file containing the process definition as well as the java code used.
Observations:
1. Almost in all application runs, 10s of jobs get their retries_ < 1 and need to be reset via a Sql statement so that the engine completes them. Question: if a job fails, it fails before even executing the "execute(DelegateExecution execution)" method, right? or it relies on transaction rollback to revert any changes. I want to know if job sends an email but failed the first time, retrying and successfully completing the job the second time does does not result in having the email sent twice, right?
2. Often, the engines stops processing instances after it completes ~9.5 out 10k instances. Sometimes, a deadlock is detected by the application. At others, the deadlock is detected by the database as you can see per the screenshot in attached doc file. Worst of all, is that some times, the engine blocks (nothing happen) forever. From analyzing, the process thread dumps (again see attached doc file),  It seems, the engine hit an undetected deadlock. Resting the retries_ count for jobs with retries_ < 1 does complete some jobs but it blocks again. As you can see from the threads dump, many thread are in a runnable mode waiting on the database and too few remain available for further processing. This explains might explain the slowness that some times happen and therefore the engine needs to be restarted in order to continue.
most of the runnable threads have passed in those 2 methods:
org.activiti.engine.impl.persistence.entity.ExecutionEntityManager.updateProcessInstanceLockTime line: 205 
org.activiti.engine.impl.persistence.entity.ExecutionEntityManager.clearProcessInstanceLockTime line: 212 
By the way, I had encountered similar situations with less executors and more threads. In case you wonder why those specific number of executors and threads, it is the case with which I can frequently reproduce the issue with.
Thank you!
Dan