cancel
Showing results for 
Search instead for 
Did you mean: 

Thread/Transaction-safe action rules for massive concurrency

iru
Confirmed Champ
Confirmed Champ

Hi there,

Got a task that seems easy but is getting us lots of headaches.
Functionally speaking, what we need is to create a custom rule in a folder, that given a file that is being uploaded, moves it into another folder, based on its creation date.

  • File enters: Inbox
  • File is moved to:  Folderized / yyyy / MM / dd / ... (in production up to minutes, for example)


Rule must implement two subtasks

  1. create the folderized structure if not existing
  2. move the file into the folderized structure

____


Now, because this must work thread-safely, on a concurrent batch of file upload, it may happen that multiple threads enter the rule at the same time, and here start the problems.

  • We started using the synchronized word, in the upper method (ActionExecuterAbstractBase#executeImpl), inner methods, or blocks of code, but this does not seem to work because even if we synchronize, the transaction may have not been committed, hence, no task1 output is available to other threads. Added a sleep but cannot ensure that it's done anyway and fails sometimes.

  • We also tried the RetryingTransactionHelper (RTH) workaround exposed by Alexey Vasyukov in the lightning talk about 'Alfresco repo under concurrent write load', of this year's beecon, but got several problems too.

I've already asked how this helper is supposed to work when the flag 'requiresNew' is set to false, because it seems not to retry, or we may be misunderstanding something.
https://community.alfresco.com/message/815007-re-retryingtransactionhelper-learnings?commentID=81500...

The problem with this utility is that it behaves differently on a postgres based server vs. mysql based server (we're using latest alfresco520,  mysql server-core 5.7, with 5.1.32 driver , Ubuntu 16; Updated  checked InnoDB for all tables for transacational management)

Updated Tested in SQLServer too, and gives different behaviour from previous ones too.

The approach that works fine in Postgres is having task 1 under a RetryingTransactionHelper (and new transaction), and don't go into task 2 until it's successfully performed.
Task 2 is not executed under a RTH.

However when we try this with mysql, fails. Guess this error is launched because task 1's output is still not fully operable (due to being made inside a transaction) to the parent transaction.

Updated- MySQL  (first task with RTH, second task not - NonRootNodeWithoutParentsException)
- src: https://pastebin.com/raw/DD0bWjW2
- action context: https://pastebin.com/CVWS9U9q
- error: https://pastebin.com/raw/gx2TkWxn

Updated-SQLServer (DataIntegrityViolationException)

- error: https://pastebin.com/raw/BZSTKDUX 

We also tried to insert the second task inside RTH.
With the requireNew flag set to true seems as if the node is not visible to the code within the transaction and when trying to move it, the node is not found.
Guess because it's a new transaction and does not know about the "working in progress node" (seems some AOP implementation)

Updated Clarify that this happens no matter the database provider, so it's nice to see it's coherent.

Updated (both tasks with RTH, with requiresNew - InvalidNodeRefException: Node does not exist)
- src: https://pastebin.com/CvrRSt3X
- action context: same as before

- error: https://pastebin.com/raw/yVNc8YWf

If we insert it with a requireNewFlag set to false, it gives the previous NonRootNodeWithoutParentsException error.

______

We've tried other approaches/combinations and got several kinds of errors, but tried to summarize everything as much as possible.

The only solution that has seemed to work (in any database provider, concurrently) is when rule is executed in background. However, this is not desired; we would like it to be sequential, so final user can know if the action has been correctly performed.

We could also have a cron-like task, that populates the folderized structure, but will lead to lots of empty folders if no data is uploaded and would have to purge it afterwards.

Any suggestions on how to approach this?


Would provide code, but its something quite simple using fileFolderService, both to create and move. We've  analysed the code that is in the default "MoveActionExecuter" action rule provided in the core of Alfresco, and its quite similar to ours.

Thanks in advance.

12 REPLIES 12

iblanco
Confirmed Champ
Confirmed Champ

Really interesting post.

We had exactly the same issues some time ago trying to do exactly the same thing. It was long time ago so I might be misleaded in some details.

During our multiple attempts to solve the issue I can remember that we reached a point were creating a folder returned a FileExistsException but at the same time we could not get and use the folder itself.

I thought that Alfresco always set READ_COMMITED isolation level to all connections so I said "that must be a non repeteable read" but I have just rechecked it now and the documentation states clearly that by default the databases default isolation level is used so in the case of Mysql this is "REPETEABLE_READ" and non repeteable reads could not happen. So now i'm a bit clueless about how this happened.

In this case retrying the transaction could in fact solve the issue because if the new file already exists once the transaction is retried the logic that checks the existence of the folder will see the folder and won't try to create it so the exception won't happen again.

But of course that is something very especific to the use case and a developer can't expect RTH to be  able to "guess" what different execution path the logic will follower when retrying.

I think that we tried to catch the FilExistsException and wrap it in some kind of retryable transaction so that the RTH would just retry it, but I think that this path had other issues.

Maybe doing the whole operation in a RTH with a new transactionm catching the FileExistsException in your code and restarting the opeartion by yourself (again in a new transaction) will do the trick. But to be honest I think that we tried this and another ton of things and in the end we think that we just gave up.

tfrith
Champ on-the-rise
Champ on-the-rise

We had a similar issue (although we weren't creating folders as fine-grained as to the minute).
We found that you'd get two threads that try to create the same folder because before commit they can't see each other's new folders.

We ended up having another process create the folders ahead of time.

As you mentioned, this can leave a lot of empty folders which would have to be cleaned up later.
Would this be so bad if a cron task did it fairly often?

What if you changed the approach a bit:
1. don't trigger your action on an inbound rule for every document
2. use a cron task to handle documents in batches
3. for each batch:
      a. make a pass through all the documents in the batch to determine required folders (to cover all the creation dates)
      b. create all the folders
c. then move all the documents

That way you should never have one thread trying to create the same folder as another thread.

You can experiment with optimal batch size and frequency for your needs.

iru
Confirmed Champ
Confirmed Champ

Been a while, just wanted to update status.

We ended up developing the desired functionality in the client's Alfresco API wrapper, where all operations from different projects join together.
Just before the upload methods. the desired folder structure is created, and documents are directly uploaded to their destination.

Via opencmis library, no concurrency / thread-safe issues
However, solution is not generic enough so as to benefit from it
__

@iblanco thanks for the insight, seems we're all working on the same direction, stepping same rocks in the way. guess we require a more advanced knowledge / time, in order to provide a solution, compatible with current architecture design.

@tfrith, thanks for the idea, but one of our requirements is to make all the upload operation, up to the destination folder, synchronously, so no option for batch movements afterwards.

stay hungry!