Hyland Connect

marnad · ‎06-04-2019

Hi,

I installed ACS 6.0 in cluster (2 nodes) under two windows 2016 server failover cluster (in vmware). The server didn't took the load (users sessions) althought i gave them a lot of ram (96g) and CPU (16). Tomcat jvm has 64g. Did someone installed an ACS 6 cluster that is running fine and support about 200-300 users? What is your solution?

Ubuntu?

Thanks

marnad · ‎06-05-2019

BTW, Heiko, what are the version of alfresco and solr you are running? On which OS?

Thnks

Marc

heiko_robert · ‎06-05-2019

Marc,

most of our customers run on Alfresco 5.1/5.2, Solr 4 but from my experience the versions 5-6 differ only marginally in terms of performance and scalability.

Solr is a memory beast and requires special attention. Solr 6 has more features and better tools to avoid OOM exceptions but you need to handle them. Some recommend to split your index using sharding to overcome hardware limits.

90 % of unplanned downtimes we have seen in the last 10 years were related to an effect which we call thread escalation: If the Alfresco repository waits on or gets exceptions from other components (database, transformation, solr query) it will create new threads until the whole system has eaten all resources. The retrying transaction concept even accelerates the thread escalation. So

the main rule to scale your alfresco is to avoid increasing of threads. Monitor the number of threads and if they are increasing find out the root cause. On heavy used production systems you have only ~30 minutes to avoid a thread escalation and system downtime.
- are there enough free db connections ( expect 1 connection per active client + 20%) and is the db fast enough?
- if you give the jvm a lot of memory: check and monitor garbage collection
- are there increasing response times and number of threads from share?
- is the system waiting on solr response (mostly caused by OOM)? This may be addressed by log and jvm monitoring combined with automated kills/restarts. Check new tools shipped with solr 6 or write your own
- is the system waiting on transformations (mostly requested by share and solr) and therefore creating more (waiting) threads?

marnad · ‎06-27-2019

After many tests, it looks like my problem is a combination of cluster feature + High volume documents. When i test a 2 nodes cluster (2,5 million docs) with 10 concurrent sessions, it take tomcat about 30 min to top his memory usage and CPU. At that moment, alfresco become very slow, almost non-responsive. And most important, we could have more than 40 actives connexion to Oracle. Same test, same server with cluster feature disable, the server runs fine and the database connexion stay low (1-3). On the other end, i've got another setup (identical to the previous cluster) that is for dev, so with less documents (170 000). When i did the same test in cluster, alfresco stay very responsive, no balloon for tomcat, low active connexion to Oracle. So it looks like the volume has an influence when in cluster mode. Did someone experience this kind of situation?

Thanks

cesarista · ‎06-29-2019

Hi Marc, volumetry has always a big influence (but not only for a cluster). The term sizing speaks about this too. It is not the same an Alfresco alf_node_properties table with thousands of records, than one with several hundreds of millions. It is not the same a database size of 10Gb than a database 100 times bigger. It is not same the indices size in SOLR for some thousands of documents, than the size for several millions. The number of documents, the number of metadata properties and ACLs have a big impact on the resources needed by Alfresco and its components such as as SOLR or relational database. For example, the needed JVM for a SOLR instance depends directly on the number of documents of the repository.

https://docs.alfresco.com/5.2/concepts/solrnodes-memory.html

In your cluster, do you have SOLR in a dedicated machine or do you deploy SOLR in both nodes ?

Kind regards.

--C.

marnad · ‎07-02-2019

Hi Cesar,

I did the calculation for SOLR and validate with alfresco, so i hope that it is ok. My alf_node_properties table is about 50 millions records. Solr engine is deploy in both nodes. It really looks like there is a problem when i activate the cluster feature on my server with high volume database. Without cluster activated, the same server react normally. What overhead, cluster is causing?

Regards,

Marc

cesarista · ‎07-04-2019

Hi:

I can't say what is causing this overhead.

But I would point or try with SOLR first. I mean, one of the recommendations/best practices for reducing performance problems (in a cluster or not) is to separate SOLR from the repository. The memory calculation and CPU consumption is only for SOLR. You have a competition on machine resources between SOLR and Alfresco repository each every moment, not only in CPU and JVM memory, also in IO (SOLR indexing/writing and searching/reading) while Alfresco is ingesting data in contentstore. So you can try to:

Configure a third dedicated machine with Alfresco and SOLR similar to your current nodes but in this case, Alfresco will be used only for indexing. This Alfresco node should be out of the cluster and even in read only mode.
Configure your current Alfresco nodes (let say user nodes) in the cluster, for not having SOLR locally and pointing to the new third dedicated machine for searching.

Kind regards.

--C.

Hyland Connect

Suggestions for ACS 6.0 in cluster