cancel
Showing results for 
Search instead for 
Did you mean: 

Scary log messages with clustering

marco_altieri
Star Contributor
Star Contributor
Hi all,

I have a cluster with two alfresco (3.2r2) running in a standard configuration (as described in the wiki pages).
Both servers share the database and the file system and the cache is configured as suggested in ehcache-custom.xml.sample.cluster

When both servers are running the log file (on both servers) starts to get full of terrifying messages:


18:44:15,745  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: workspace://SpacesStore/a9a9f477-5c57-11dc-ad6c-5136d620963c
18:44:15,747  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: workspace://SpacesStore
18:44:15,750  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: workspace://SpacesStore/c317f789-5c57-11dc-ad6c-5136d620963c
18:44:15,753  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: system://system/5f5a740f-2589-4629-bebe-2b63e7176b93
18:44:15,754  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key:
18:44:15,760  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: http://www.alfresco.org/model/system/1.0
18:44:15,762  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 2
18:44:15,762  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: http://www.alfresco.org/model/application/1.0
18:44:15,763  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 14
18:44:15,764  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: http://www.alfresco.org/model/rule/1.0
18:44:15,765  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 16
18:44:15,767  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: en_US_
18:44:15,774  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 4
18:44:15,775  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: text/plain
18:44:15,776  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 2
18:44:15,777  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: utf8
18:44:15,778  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 2
18:44:15,779  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 4
18:44:15,781  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 24
18:44:15,782  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/system/1.0}descriptor
18:44:15,784  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/system/1.0}children
18:44:15,786  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 26
18:44:15,787  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/system/1.0}versionSchema
18:44:15,789  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/system/1.0}store_root
18:44:15,790  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/content/1.0}folder
18:44:15,792  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 8
18:44:15,793  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/content/1.0}contains
18:44:15,794  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 50
18:44:15,795  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 68
18:44:15,798  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/rule/1.0}ruleFolder
18:44:15,799  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 122
18:44:15,801  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/content/1.0}systemfolder
18:44:15,802  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 124
18:44:15,803  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 28
18:44:15,804  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/system/1.0}versionMajor
18:44:17,306  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 30
18:44:17,312  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/system/1.0}versionLabel
18:44:17,313  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 32
18:44:17,314  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: {http://www.alfresco.org/model/system/1.0}versionRevision
18:44:17,315  DEBUG [ehcache.distribution.RMICachePeer] Remote remove received for key: 34

Each key is printed more than one time, and the system seems blocked in an infinite loop (I waited for more than 15 minutes).
Worse than that, in the meantime, no alfresco is able to serve requests.
Luckily when I restart only one server, it works and no documents have been deleted.

Do you have any suggestions?

Thanks
Marco
8 REPLIES 8

pmonks
Star Contributor
Star Contributor
The issue here is that you've turned on debug logging for EHCache, which is inadvisable since it is used a lot in a cluster and so will generate a lot of log messages.  If debug logging is enabled for more than just EHCache then that would also explain the slowness of the system - with debug logging enabled system wide, Alfresco will be generating a tremendous amount of logging output, which slows down the system considerably.

Cheers,
Peter

marco_altieri
Star Contributor
Star Contributor
Hi pmonks, thank you very much for your response.

It would be great if the problem is just a logging issue. I'll do a check asap.

I'm a bit pessimistic about that.
I used log levels suggested in alfresco wiki pages and the amount of log messages does not seem too much.
I saw much more messages when I used DEBUG for hibernate level and these messages didn't lock the server.
It's also difficult to understand why the servers share so many "remove operations" and for such a long period of time (more than 15 minutes).

Marco

pmonks
Star Contributor
Star Contributor
Those remove operations are cache invalidation events, not content removal events.  EHCache on one node in the cluster is informing the other node(s) in the cluster that certain cache entries are stale and should therefore be dropped from their caches.

Cheers,
Peter

joe_l3
Star Contributor
Star Contributor
Hi all,
I had the same problem. I also configured a cluster with two Alfresco nodes (both servers share the database and the file system). When I enabled  ehcache with automatic peerDiscovery property on both nodes, the performance of two application servers collapsed and log files were filled with "Remote remove received for key:…." messages !

Here my system configuration:
- Alfresco Comunity Ed. 3.2r2
- Mysql 5.0.77
- Java 1.6.0_16
- Apache Tomcat Server 6.0.18

Possible solution:
In ehcache-custom.xml I tried to disable the automatic peerDiscovery property for cacheManagerPeerProviderFactory. I set "peerDiscovery=manual" and a limited number of rmiUrls. Here an example:

<cacheManagerPeerProviderFactory
            class="net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory"
            properties="peerDiscovery=manual,
                        rmiUrls=
         //node2_IP:40001/org.hibernate.cache.StandardQueryCache|
         //node2_IP:40001/org.hibernate.cache.UpdateTimestampsCache|
         //node2_IP:40001/org.alfresco.repo.domain.hibernate.NodeImpl|
         //node2_IP:40001/org.alfresco.repo.domain.hibernate.QNameEntityImpl|
         //node2_IP:40001/org.alfresco.repo.domain.hibernate.NodeImpl.aspects|
         //node2_IP:40001/org.alfresco.repo.domain.hibernate.NodeImpl.properties|
         //node2_IP:40001/org.alfresco.cache.compiledModelsCache"
       />
    <cacheManagerPeerListenerFactory
            class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory"
            properties="port=40001, socketTimeoutMillis=5000"
    />
…………….

Obviously you have to do a study to determine what ehcache items to delete from rmiUrls without regression.  :idea:

marco_altieri
Star Contributor
Star Contributor
Hi Peter,
I tried disabling log masseges, but the clustering was still totally freezed. Obviously I can't see any message…
The CPUs were 99% allocated to tomcat processes on both servers.

Those remove operations are cache invalidation events, not content removal events.  EHCache on one node in the cluster is informing the other node(s) in the cluster that certain cache entries are stale and should therefore be dropped from their caches.

Cheers,
Peter

marco_altieri
Star Contributor
Star Contributor
It doesn't seem a problem related to the database because I imported it on a cluster configured on my PC and the it worked fine.

marco_altieri
Star Contributor
Star Contributor
I realized that the disk used  by the cache org.alfresco.cache.ticketsCache grows constantly at a rate of about 10MB/hour.
Is it possible that this cache is the cause of the traffic that freezes the cluster?

Is it still possile to disable clustering for this cache ?

Thanks!
Marco

sam13
Champ in-the-making
Champ in-the-making
You two has a good  and full of value  conversation,,,.
Thanks…