cancel
Showing results for 
Search instead for 
Did you mean: 

Server alfresco

hunterbit
Champ in-the-making
Champ in-the-making
I do not know if this is the right place to write.
Given that I must install a server for alfresco wanted to know more or less than the minimum we wanted and if it is better to have more RAM or more powerful CPU.
Thank you.
4 REPLIES 4

pmonks
Star Contributor
Star Contributor
<disclaimer>
Performance tuning is highly dependent on any number of factors, and those factors are highly sensitive to each specific environment.  In other words there are no magic bullets for tuning application performance, whether the application is Alfresco or something else.
</disclaimer>

With that out of the way :wink:, what I can say is that Alfresco tends to be I/O bound, so it's best to have a high bandwidth, low latency disk subsystem for Alfresco to use (eg. RAID or a SAN) as well as a high bandwidth, low latency connection to the relational database that Alfresco is configured to use.  Larger RAM can help here, to give the OS more room for caches, but most modern CPUs (dual core, 2+Ghz etc.) are more than enough for Alfresco (Alfresco doesn't perform very many computationally expensive operations - most operations are disk and database I/O intensive instead).

Cheers,
Peter

hunterbit
Champ in-the-making
Champ in-the-making
Thank you,
your reply it's very important for me.

jharrop
Champ in-the-making
Champ in-the-making
Peter said:
Alfresco tends to be I/O bound, so it's best to have a high bandwidth, low latency disk subsystem for Alfresco to use (eg. RAID or a SAN) as well as a high bandwidth, low latency connection to the relational database that Alfresco is configured to use. Larger RAM can help here, to give the OS more room for caches, but most modern CPUs (dual core, 2+Ghz etc.) are more than enough for Alfresco (Alfresco doesn't perform very many computationally expensive operations - most operations are disk and database I/O intensive instead).

whereas Kevin (at [1]) said:
Generally Alfresco is much heavier on the CPU than the DB - adding a faster CPU/more cores will generally give you a lot better performance increase than changing the DB.

I have assumed that Alfresco performance comes down to hibernate + database performance

If this is true, then the question becomes:

1.  is there anything special about the way Alfresco uses hibernate which affects general principles applicable to hibernate based apps?

2.   which database performs best (MySQL, Postgres etc) with Alfresco and/or hibernate? 

3.  if you start with an affordable system (eg quad core Xeon, 4 GB RAM, a couple of 15K SAS drives), where is the bottleneck likely to be?

If you are using MySQL, the MySQL Enterprise Monitor looks like it might help to answer Q3.

cheers

Jason


[1] http://forums.alfresco.com/en/viewtopic.php?p=29038&sid=b0cd7dc8c2455458bd81e8adae1f0a7a#p29038
[2] http://lstigile.wordpress.com/2007/12/27/cms_mysql/

pmonks
Star Contributor
Star Contributor
I have assumed that Alfresco performance comes down to hibernate + database performance
That is my experience, but as I mentioned earlier performance is intricately tied to any number of factors.

1. is there anything special about the way Alfresco uses hibernate which affects general principles applicable to hibernate based apps?
Alfresco does not do anything particularly special with Hibernate, other than programmatically forcing flushes during certain large operations (to avoid Hibernate running out of memory in the session cache - Hibernate places no upper bound on the session cache, so for large transactions it can consume all heap then fail with out of memory errors).  I believe this primarily occurs in some of the WCM related operations (eg. promote-to-staging), but may also happen within some large DM operations as well (although I'm not 100% sure of that - it should be fairly evident from the code exactly where this flushing occurs).

2. which database performs best (MySQL, Postgres etc) with Alfresco and/or hibernate?
I don't know if anyone has performed an empirical comparison of performance with Alfresco running on different database servers, but such an exercise would be extremely valuable.  What I can say is that MySQL is probably the most widely deployed database for Alfresco, so it's likely to be the most highly tuned.  FWIW Alfresco engineering went through a performance tuning exercise for the non-MySQL databases in the 2.0 timeframe, so it's not like they haven't been tuned at all.

3. if you start with an affordable system (eg quad core Xeon, 4 GB RAM, a couple of 15K SAS drives), where is the bottleneck likely to be?
To echo my earlier disclaimer, it's impossible to predict with any real certainty what the performance characteristics of any given configuration / system architecture will be.  In other words I would suggest empirically determining the bottleneck, rather than trying to guess what it is and risk architecting a solution based on a false assumption.

That said, I wouldn't be surprised to find that disk I/O is the bottleneck in this configuration, assuming that the database and Alfresco are both on the same server (both the database and Alfresco will tend to be disk I/O bound in this scenario).  There are some architectural choices that can be made to mitigate this - using a RAID level that maximises performance, divide up the drives so that different types of files (eg. OS and application binaries, temp files, database data and transaction log files, Alfresco contentstore and Lucene indexes) are on independent I/O channels, etc. etc.  Note that this may mean buying a greater number of smaller drives than is normally the case, and then configuring them in sets of independent RAID arrays.

You might also consider separating Alfresco and the database onto separate servers (so they're not in contention for disk I/O), in which case they should ideally be on the same subnet (since Alfresco can quickly become database I/O bound).  This also gives greater flexibility for further tuning (should that be necessary) - the database and Alfresco servers can be tuned independently, based on their individual workload characteristics.

Cheers,
Peter