cancel
Showing results for 
Search instead for 
Did you mean: 

Out Of Memory during Bulk Upload via CMIS: Memory Leak?

ad-int-en
Champ in-the-making
Champ in-the-making
Hi,

we are running Alfresco 3.4.d 64 Bit on different platforms (CentOS 5.6 and Windows 2008 Server R2) and perform some mass-import tests to test the stability of the Alfresco Platform. So far results do not look very promising.
On CentOS we ran into either Out Of Memory Exceptions after trasnfering about 5000 files via CMIS with 5-10 Threads (including setting of some aspects and folder creation via CMIS) or after about 10.000 files via FTP with 10 parallel upload Threads.

Transfering with FTP looks more promissing so far, on the CentOS Platform we where able to transfer about 10k files but then the FTP Service of Alfresco seemed to quit and did not react anymore, only a restart of alfresco could fix the problem.
On Windows with the standard configuration of alfresco 3.4.d community so far we where able to transfer 30k+ Files via 10 ftp threads without any issues, test is still running though.
A test with CMIS on Windows was also failing with an Out Of Memory Exception just like on CentOS.

When creating a Heap Dump we saw that lucene was eating up most of the memory working on the different indexes, which in general is ok, but the fact that we always run into OOM no mather how much memory we give to the JVM looks like there is a memory leak somewhere in the whole implementation (especially when using the CMIS Interface for some reason). The Heap Memory graph always looks pretty much the same, always like a jigsaw but with the amount of memory that is freed up by the GC getting less and less and finaly resutling in the GC running all the time to free up just a minimum amount of memory and then leads to the OOM deatch of the JVM.

At the moment we are running a 260.000 file Upload via FTP which ist the biggest batch we have to be able to import into the DMS, if this works via FTP we have at least found a way to import the data, but in the production we have to use the CMIS interface for smaller but steady batches 24/7 and with the current configuration and the possible memory leak this is porbably not going to work.

On CentOs we set the number of open files to 32k (using ulimit -n 32000) and tried different memory settings (standard configuration with Xmx768m up to tweaked memory settings and gc setting and with up to Xmx5G).

Any hints on how to configure Alfresco properly especially for bulk imports via CMIS would be highly appreciated. We have 8 GB RAM , 8CPU  cores and 64 Bit System (at the moment running on WIndows 2008 Server since it looked more stable in the tests so far).
We followed the hints from the alfresco presentation "Scale Your Alfresco Solutions" by Alfresco Product Manager Mike Farman but without success as well as all other hints we could find on the net, but if there is in fact a memory leak all the JVM settings will not help at all.

Thx for you help in advance
Paul
9 REPLIES 9

alcibiade
Champ in-the-making
Champ in-the-making
Hi Paul,

I'm running an instance on which we injected 3.5M documents remotely using OpenCMIS.
The idea here is to switch off index refresh using this configuration during your parallel injection:


index.tracking.disableInTransactionIndexing=true

This way you'll get maximal throughput and indexes will be rebuilt when you disable this and restart. We have a production system on which we automated this switch to be able to inject documents efficiently every night and have indices rebuilt 6-7AM.

Alas we tend to have random out of memory occurrences in production that we still can't really explain, but we are on a 32bits system so we can only increase heap size up to 1600m.

Hope it helped !

gyro_gearless
Champ in-the-making
Champ in-the-making
As a side note, these "OOMs" may or may not indicate an problem - there is a known problem in Sun JDK, which causes OutOfMemory exceptions when there is really a lot of memory available –> see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
Lucene is a prominent victim of this bug, and there are workarounds in Alfresco to catch these OutOfMemoryExceptions - it might be that these exceptions show up in the log, so you should examine the stacktrace if they originate from Lucene….

And we should give Sun the "golden cucumber award" for not fixing that bug for YEARS  :shock:

Cheers
Gyro

ad-int-en
Champ in-the-making
Champ in-the-making
Thanks for the tip with the index.tracking.disableInTransactionIndexing=true switch, we tried that and it helps, but it mostly breaks the alfresco share functionality so this mode doesn’t really work for us.

we did some further heap dump and source analysis and found out that most of the Memory is consumed by objects in the lucene RAMDirectory / RAMFile structure, so we changed first the maxDocsForInMemoryMerge to 0 (as mentioned in similar issues) and later the maxDocsForInMemoryIndex settings to 0. With both settings set to 0 the time it took to completely consume the memory old gen increased, but the problem still existed.
Analyzing a new heap dump showed that the majority of the heap is consumed by org.apache.lucene.index.TermInfosReader$ThreadResources instances (about 60k).

Digging a little deeper revealed that a single IndexInfo instance contains about 10k instances of ReferenceCountingReadOnlyIndexReader in the deletableReaders queue.
All of the 10k readers have a refCount > 0 and will therefore not be deleted by the IndexInfo cleaning task.
I could not jet find out how those refCounts will be incremented and/or decremented.
But looking at the roundabout 11k open file handles (via JMX) it seems like there is a connection to open file handles.

ad-int-en
Champ in-the-making
Champ in-the-making
@Gyro.Gearless:
Thank you for your input, but blaming Sun doesn't help us here since a bug in JDK is, from what we found out so far, not the cause of our issue. I am not saying that it can be excluded form the list of suspects, but it is very unlikely from my point of view. We clearly see in jConsole when obeserving the heap space, that the old gen gets filled up with the above mentioned  ReferenceCountingReadOnlyIndexReader in the deletableReaders queue. These instances seem to have some references that are never released and therfore fill up old gen until a GC run can not clear up any more memory and ends up in permanently running causing a high load on the CPU and finaly end up in the JVM dying with an OOM Exception.

@alcibiade
could you tell us a bit more about your configuration? I see on your profile you are running Alfreso 3.3, and you are running on a 32 BIT System. What OS do you use, which Databasem which Java Version and what configuration settings within Alfresco did you tweak besides disabling the in transaction indexing?

As far as I understood you are using the CMIS interface, do you use AtomBinding or Webservices?
We found out that the AtomBinding implementation is quite poor since it can not stream big files and easily causes an OOM when the filesize exceeds the JVM Memory settings, so we are using the Webservices Interface with out CMIS client.

You wrote that you are also getting OOM Exceptions, did you analyse the Heap Dumps to find out where all the memory goes? Is it also some Lucense Classes?

Tank you to both of you for your Feedback, always lucky to get any feedback at all here on this forum 🙂

rmacian
Champ in-the-making
Champ in-the-making
Consider to use bulk filesystem import module from Peter Monks. Is a LOT faster than FTP. I've got speeds twice faster than ftp.

http://code.google.com/p/alfresco-bulk-filesystem-import/

ad-int-en
Champ in-the-making
Champ in-the-making
Thx  rmacian, we tried the bulk filesystem import and it is indeed fast a hell (we got around 30 files/second import rate), interesting is also that even with InTransactionIndexing turned on, the Memory Leak does not appear. So for an intial bulk import it seems to be the tool/interface to go with…BUT unfortunately we need a reliable interface to pump data into the system constanty (at a lower file/second rate) but with the posibiilty to set aspects and meta data. So CMIS seems to be the only way to go for us.

So up to know, even with InTransactionIndexing beeing disabled we are not able to import mroe then approx 140k files into the DMS without running into an OOM.

Thank you again to all for your input on this topic so far, any more hints are highly appreceated!

rmacian
Champ in-the-making
Champ in-the-making
The point of the bulk import is that permits to set metadata on the fly with the metada.properties files. Indeed you can fill only metadata for the documents that are allready uploaded by only writting the .metadata.properties files

What is not working for you ?

On my blog I have done some tests (Spanish)

http://alfrescoadmin.blogspot.com/2011/06/importacion-masiva-con-alfresco-bulk.html

Raúl

ad-int-en
Champ in-the-making
Champ in-the-making
Raúl, thanks a lot for your quick reply. The bulk importer is realy cool, but it is not a standardized interface to the outside world.
SInce we a running a SAS-Solution we need to stick to standard interfaces that can be accessed programaticaly, plus we already invested a lot of development work to intergrate the CMIS Interface into our exsiting infrastructure, so a switch to different interface is not possible for us.
Thanks anyway for your replies.
Also this thread is about a (possible) Memory Leak in the CMIS Integration of Alfresco (and probably also in the FTP Interface) so the buk importer is a bit offtopic. But it is still very intersting to see, that using the bulk importer the memory does not seem to occur even with the standard lucene settings in alfresco.

rmacian
Champ in-the-making
Champ in-the-making
Ok,

Just only to see if this is a bug that may be already fixed you could download a nightly build of Alfresco (current is 4.0) and see if the error is thrown. At least you can expect to have this solved in the future

http://dev.alfresco.com/downloads/nightly/dist/