Given those results after FTP vs Import methods, I tried 4.2.e. Alfresco being way less coupled to Lucene by using SOLR, things get better.
However, as we will have about 1k+ files uploaded/day, we are really concerned about server resources, and Alfresco's memory consumption. So, we would like to scale the solution and server's allocated resources, and be sure we won't run out of heap memory. After fews tests, I've figured out that the following classes have a high impact on heap memory: - org.alfresco.repo.cache.DefaultSimpleCache - org.alfresco.repo.cache.lookup.CacheRegionKey - org.alfresco.repo.cache.lookup.CacheRegionKeyValue - org.alfresco.repo.domain.node.AbstractNodeDAOImpl$ParentAssocsCache - org.alfresco.repo.domain.node.AuditablePropertiesEntity - org.alfresco.repo.domain.node.ChildAssocEntity - org.alfresco.repo.domain.node.NodeEntity - org.alfresco.repo.domain.node.NodeUpdateEntity - org.alfresco.repo.domain.node.ParentAssocsInfo - org.alfresco.repo.domain.node.StoreEntity - org.alfresco.repo.domain.node.TransactionEntity - org.alfresco.repo.lock.mem.LockState
By lowering all max cache item values in caches.properties, I've been able to reduce to 0 (or almost) the memory footprint of some classes, but those following still consume memory: - org.alfresco.repo.domain.node.AbstractNodeDAOImpl$ParentAssocsCache - org.alfresco.repo.domain.node.AuditablePropertiesEntity - org.alfresco.repo.domain.node.ChildAssocEntity - org.alfresco.repo.domain.node.NodeEntity - org.alfresco.repo.domain.node.ParentAssocsInfo - org.alfresco.repo.domain.node.TransactionEntity - org.alfresco.repo.lock.mem.LockState
ServerEntity and StoreEntity still show an instance count related to uploaded document number, but the memory allocated is not a real threat. Lowering cache max item values obviously did impact performances, but the point was only to identify which value is affecting which class instanciation.
What else can be tuned to "control" Alfresco's memory consumption, so we can determine the JVM heap memory size? Or lower some values at the price of a managed performance loss?
You can also reduce the number of threads and reduce the stack sizes. However you are reducing alfresco's performance. Normally tuning will increase various settings.
However what is the concern here? If you give alfresco memory it will use it, and that shouldn't be a problem, that's just normal operation.
And 1000 docs per day is a fairly trivial import load that shouldn't cause any problems.
Hi, thanks for pointing thread number and stack size out. We know that reducing alfresco's memory usage will reduce its overall performance. The point is to find out some levers in case we cannot allocate all needed resources with the default parameters. Our 3 VM production servers have like 20MB of memory, each are already running around 5 to 8 VM… So, we won't really be able to allocate more than 4GB to the server's OS running Alfresco. I'm currently running a test, importing 1 million files via FTP. After 350k files, things are looking good, 3GB allocated for java's heap, Alfresco doesn't seems to need more for the moment.
1k files a day won't be stressful, but as each uploaded file is cached, we want to be sure that, like after 60 days java won't go OOM due to cache size in memory. That's all
Hi, I've configured the JVM to run with a 3GB heap. I've successfully uploaded 450k files. Max heap memory used was around 2,5GB, and went above after around 150k files uploaded. Given that other operations might allocate memory (like searching, putting files in cache), yes, 4GB seems good