cancel
Showing results for 
Search instead for 
Did you mean: 

Performance, large transaction

buurd
Champ in-the-making
Champ in-the-making
Hi!

I'm running a large importjob that are going to put about 100 000 documents into our Alfresco-installation. But there are some problems that makes me a little bit worried.

1) We add all files into import-maps, they are named with unique names from the old system
2) We add metadata-files from the old system
3) We have written a small javascript that reads the metadata-file and then find the file with the unique id and copy it to it's right location and adds metadata according to the metadata-file.

The problem we see is that when we run a batch with about 1500-2000 files that are going to be copied and get metadata the system locks up for a while. And if the batch is large enough we get warnings in the log about transactioncache being full and, perpaps we should have waited for longer than an hour, restart alfresco.

So we handled the problem, running smaller batches that doesn't cause that long lockups (1-2 minutes instead of 15-20 min). Boring work but accepable.

Now I tried to remove some of the spaces holding the original files. Each space with original files contains between 1000-2500  files. Now the warning about the transactioncache occurs.

For a system that say it can handle 100 000 000 documents this seems very strange. I can understand that some tasks might take some time but I really don't understand why I get lockups???

Now I hope that someone is going to say that this is a common error of configuration that easially can be fixed. I'm going to get a tough time explaining this problem otherwise.

Thanks in advance
Roland
3 REPLIES 3

mrogers
Star Contributor
Star Contributor
The issue is in the title.   Its due to the size of your transactions.     So if you try to write 100K documents in a single transaction then you are going to have problems.     As you have already found if you choose smaller transaction boundaries then things will work better and the locking will be less.

The transaction locking that happens is probably at database level so if you post details of your database and configuration there may be some advice.

buurd
Champ in-the-making
Champ in-the-making
The problem isn't in the batchjob, we handle it by limit the size of the transaction, even though it might be nice to be able to force a commit in the javascript for each file imported.

The problem is the action that we might expect a user to perform, delete a space. It also cause lockups. How to work around that? People are going to import zip-files with a lot of documents and then discover that it was the wrong version, or the wrong locations and then delete them. That operation can't stop other users to write to file, right?

We have support on the issue now so I'll report back when we have a solution.

buurd
Champ in-the-making
Champ in-the-making
Some more information about the issue. It seems like it is versioning that is the problem. We have a default rule that makes sure that everything gets versioned, unless for those spaces where we manually removes the rule.

If we run the batch to copy files into a directory where the rule is so add versionable directly we get a long delay, not if the rule is added in the background. Same thing for files that already are versioned, there it seems like the "copy" of the file get the versioning directly and thereby case the delay.

If there are no versioning there are no delay and the number of files needed to cause a delay seems to be larger we tried with.