<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unable to rolling restart my cluster due to Hazelcast timeouts in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35750#M15087</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Have you tried disabling multicast and instead listing the members of the cluster individually in the hazelcast config?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It looks something like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;hz:join&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;hz:multicast enabled="false"&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; multicast-group="224.2.2.5"&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; multicast-port="54327"/&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;hz:tcp-ip enabled="true"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;hz:members&amp;gt;10.84.1.151,10.84.1.152&amp;lt;/hz:members&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/hz:tcp-ip&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/hz:join&amp;gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 20 Oct 2017 20:52:46 GMT</pubDate>
    <dc:creator>jpotts</dc:creator>
    <dc:date>2017-10-20T20:52:46Z</dc:date>
    <item>
      <title>Unable to rolling restart my cluster due to Hazelcast timeouts</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35748#M15085</link>
      <description>I am running 5.1.1 on an environment and ran into an issue yesterday under peak load.We had a couple of servers get into a bad state so we tried to do a rolling restart of Alfresco.The servers wouldn't start up because of a Hazelcast timeout.&amp;nbsp; &amp;nbsp;Probably because the cluster was so busy.We had to stop</description>
      <pubDate>Thu, 19 Oct 2017 13:30:25 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35748#M15085</guid>
      <dc:creator>josh_barrett</dc:creator>
      <dc:date>2017-10-19T13:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to rolling restart my cluster due to Hazelcast timeouts</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35749#M15086</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hazelcast timeouts can be caused by many things, from actual networking issues over CPU overload to memory / garbage collection issues on the other cluster node.The issue I have seen the most often is the latter, with a system being poorly configured and very close to garbage collection hell, where only a slight change in circumstance would bring down the entire cluster. You need to investigate what issue you were actually suffering from. I'd advise running some&amp;nbsp;JVM monitoring via i.e. jvisualvm during startup (on all cluster nodes) to get a picture of what's going on.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In some circumstances you might even be able to avoid doing a full restart of your entire cluster, e.g. if only the communication / cluster state is affected. Using the JavaScript Console you can &lt;A href="https://gist.githubusercontent.com/AFaust/beaa309837397abf961f/raw/3c2c2a469b8b8abe0300a196ffb193e8b087ce12/restartHazelcastClusterNode.js" rel="nofollow noopener noreferrer"&gt;restart only the Hazelcast layer&lt;/A&gt;, and using the &lt;A href="https://github.com/OrderOfTheBee/ootbee-support-tools/wiki/Caches" rel="nofollow noopener noreferrer"&gt;Caches tool of the OOTBee Support Tools&lt;/A&gt; addon you can &lt;A href="https://github.com/OrderOfTheBee/ootbee-support-tools/issues/70" rel="nofollow noopener noreferrer"&gt;purge&amp;nbsp;data caches&lt;/A&gt;&amp;nbsp;to remove potentially stale data.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 20 Oct 2017 12:00:40 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35749#M15086</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2017-10-20T12:00:40Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to rolling restart my cluster due to Hazelcast timeouts</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35750#M15087</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Have you tried disabling multicast and instead listing the members of the cluster individually in the hazelcast config?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It looks something like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;hz:join&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;hz:multicast enabled="false"&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; multicast-group="224.2.2.5"&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; multicast-port="54327"/&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;hz:tcp-ip enabled="true"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;hz:members&amp;gt;10.84.1.151,10.84.1.152&amp;lt;/hz:members&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/hz:tcp-ip&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/hz:join&amp;gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 20 Oct 2017 20:52:46 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35750#M15087</guid>
      <dc:creator>jpotts</dc:creator>
      <dc:date>2017-10-20T20:52:46Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to rolling restart my cluster due to Hazelcast timeouts</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35751#M15088</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;With Hazelcast on Repository, multicast is disabled by default. The config example from Jeff applies&amp;nbsp;only to the Share tier where the Hazelcast config is embedded in Spring.&amp;nbsp;For Share&amp;nbsp;the documentation of Alfresco provides the configuration with multicast enabled. The error messages in the logs point to Repository-tier issues though.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 23 Oct 2017 08:57:36 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35751#M15088</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2017-10-23T08:57:36Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to rolling restart my cluster due to Hazelcast timeouts</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35752#M15089</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks for the replies &lt;B&gt;Axel Faust&lt;/B&gt;‌ and &lt;B&gt;Jeff Potts&lt;/B&gt;‌.&amp;nbsp; &amp;nbsp;The actual root problem was all of our Alfresco servers in the cluster were close to being maxed on CPU.&lt;/P&gt;&lt;P&gt;The issue was under peak load we had a few background (custom) processes kicking off which put the servers over the edge.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the heat of the moment we removed all of the servers from the cluster and simply had our API layer talking to Alfresco via CMIS through a load balancer unclustered.&amp;nbsp; &amp;nbsp;We thought we were all good.&amp;nbsp; &amp;nbsp;Servers seemed healthy from CPU, JVM, and the number of requests we were handling.&amp;nbsp; &amp;nbsp;But.....&amp;nbsp; &amp;nbsp;After looking into the logs a majority of&amp;nbsp;the document update calls were failing with messages like the following in our custom API logs.&lt;BR /&gt;&lt;EM style="color: #333333; background-color: #ffffff; font-size: 14px;"&gt;&lt;SPAN class=""&gt;Expected&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class=""&gt;xxxx&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class="" style="color: #3863a0;"&gt;&lt;SPAN class=""&gt;bytes&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class=""&gt;but&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class=""&gt;retrieved&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class=""&gt;0&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class="" style="color: #3863a0;"&gt;&lt;SPAN class=""&gt;bytes&lt;/SPAN&gt;&lt;/SPAN&gt;!&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We reproduced this issue in our Performance environment.&amp;nbsp; &amp;nbsp;We resolved this issue by adding the servers back into the cluster.&amp;nbsp; &amp;nbsp;The weird thing was it was only updates causing this issue.&amp;nbsp; &amp;nbsp; New document adds didn't have any issues.&amp;nbsp; &amp;nbsp;Only binary updates.&amp;nbsp; &amp;nbsp; I wonder if this is a bug with the CMIS implementation.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 27 Oct 2017 20:42:59 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/unable-to-rolling-restart-my-cluster-due-to-hazelcast-timeouts/m-p/35752#M15089</guid>
      <dc:creator>josh_barrett</dc:creator>
      <dc:date>2017-10-27T20:42:59Z</dc:date>
    </item>
  </channel>
</rss>

