ActiveMQ broker becomes unresponsive after sometime

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

ActiveMQ broker becomes unresponsive after sometime

Shobhana
We use ActiveMQ 5.14.1 with KahaDB for persistence.
From last 2 days, we have observed that the broker becomes unresponsive after running for around 4 hours! None of the operations work after this including establishing new connections, publishing messages to topics, subscribing to topics, unsubscribing, etc.

We have disabled JMX for performance reasons; so I cannot check the status/health of broker from JMX. I tried to take thread dump to see what's happening, but it fails with a message : Unable to open socket file: target process not responding or HotSpot VM not loaded!

Similar error when I try to take heap dump! But I can see that broker process is running using ps -ef |grep java option.

Tried to take the thread dump forcefully using the -F option, but this also fails with "java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007f9288012800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)"

Forceful Heap dump fails with "Expecting GenCollectedHeap, G1CollectedHeap, or ParallelScavengeHeap, but got sun.jvm.hotspot.gc_interface.CollectedHeap"

We have just one broker running on AWS EC2 Ubuntu instance. Broker is started with Xmx of 12 GB. Our server and Android applications together create thousands of topics and exchange MQTT messages (both persistent and non-persistent). Within 4 hours, around 20 GB of journal files got created in the last run before broker became unresponsive! The only way to overcome this problem is to stop the broker, delete all files in KahaDB and restart the broker!

Any hints to what could be going wrong is highly appreciated!

Broker configurations is given below for reference :

<broker xmlns="http://activemq.apache.org/schema/core" useJmx="false" brokerName="PrimaryBroker" deleteAllMessagesOnStartup="false" advisorySupport="false" schedulePeriodForDestinationPurge="600000" offlineDurableSubscriberTimeout="54000000" offlineDurableSubscriberTaskSchedule="3600000" dataDirectory="${activemq.data}">

        <destinationPolicy>
            <policyMap>
              <policyEntries>
                <policyEntry topic=">" gcInactiveDestinations="true" inactiveTimoutBeforeGC="3600000">
                  <pendingMessageLimitStrategy>
                    <constantPendingMessageLimitStrategy limit="1000"/>
                  </pendingMessageLimitStrategy>
                  <deadLetterStrategy>
                    <sharedDeadLetterStrategy processExpired="false" />
                  </deadLetterStrategy>
                </policyEntry>
                <policyEntry queue=">" optimizedDispatch="true" reduceMemoryFootprint="true">
           <deadLetterStrategy>
               <sharedDeadLetterStrategy processExpired="false" />
           </deadLetterStrategy>
                </policyEntry>
              </policyEntries>
            </policyMap>
        </destinationPolicy>

        <persistenceAdapter>
            <kahaDB directory="${activemq.data}/kahadb" indexCacheSize="20000" enableJournalDiskSyncs="false" ignoreMissingJournalfiles="true"/>
        </persistenceAdapter>

          <systemUsage>
            <systemUsage sendFailIfNoSpaceAfterTimeout="10000">
                <memoryUsage>
                    <memoryUsage percentOfJvmHeap="70" />
                </memoryUsage>
                <storeUsage>
                    <storeUsage limit="100 gb"/>
                </storeUsage>
                <tempUsage>
                    <tempUsage limit="50 gb"/>
                </tempUsage>
            </systemUsage>
        </systemUsage>

        <transportConnectors>
            <transportConnector name="openwire" uri="nio://0.0.0.0:61616?maximumConnections=1000&amp;wireFormat.maxInactivityDuration=180000&amp;wireFormat.maxFrameSize=104857600"/>
            <transportConnector name="mqtt+nio" uri="mqtt+nio://0.0.0.0:1883?maximumConnections=50000&amp;wireFormat.maxInactivityDuration=180000&amp;wireFormat.maxFrameSize=104857600"/>
        </transportConnectors>

        <plugins>
            <discardingDLQBrokerPlugin dropAll="true" dropTemporaryTopics="true" dropTemporaryQueues="true" />
            <timeStampingBrokerPlugin ttlCeiling="43200000" zeroExpirationOverride="43200000"/>
        </plugins>

</broker>

TIA,
Shobhana
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Tim Bain
Can you do all those things before the broker becomes unresponsive (e.g.
after 2 or 3 hours)? If so, that might tell us something useful, and maybe
you can script it to happen periodically (once a minute, for example) so
you can see what things look like just before it becomes unresponsive.

What do system monitoring tools like top tell you about which system
resources (CPU, memory, disk IO, network IO) you are or are not using
heavily?

To me, some of these symptoms sound like maybe you're filling memory to the
point that you're spending so much time GCing that the JVM doesn't have
cycles to do much else. There are JVM arguments that will let you log the
details of your GC activity to try to confirm or refute that theory (Google
for them, or check the Oracle documentation). Others of the symptoms just
sound really strange and I don't have any guesses about what they mean, but
I assume you Googled for each one and found no useful results?

Tim

On Apr 27, 2017 8:17 AM, "Shobhana" <[hidden email]> wrote:

> We use ActiveMQ 5.14.1 with KahaDB for persistence.
> From last 2 days, we have observed that the broker becomes unresponsive
> after running for around 4 hours! None of the operations work after this
> including establishing new connections, publishing messages to topics,
> subscribing to topics, unsubscribing, etc.
>
> We have disabled JMX for performance reasons; so I cannot check the
> status/health of broker from JMX. I tried to take thread dump to see what's
> happening, but it fails with a message : Unable to open socket file: target
> process not responding or HotSpot VM not loaded!
>
> Similar error when I try to take heap dump! But I can see that broker
> process is running using ps -ef |grep java option.
>
> Tried to take the thread dump forcefully using the -F option, but this also
> fails with "java.lang.RuntimeException: Unable to deduce type of thread
> from
> address 0x00007f9288012800 (expected type JavaThread, CompilerThread,
> ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)"
>
> Forceful Heap dump fails with "Expecting GenCollectedHeap, G1CollectedHeap,
> or ParallelScavengeHeap, but got sun.jvm.hotspot.gc_interface.
> CollectedHeap"
>
> We have just one broker running on AWS EC2 Ubuntu instance. Broker is
> started with Xmx of 12 GB. Our server and Android applications together
> create thousands of topics and exchange MQTT messages (both persistent and
> non-persistent). Within 4 hours, around 20 GB of journal files got created
> in the last run before broker became unresponsive! The only way to overcome
> this problem is to stop the broker, delete all files in KahaDB and restart
> the broker!
>
> Any hints to what could be going wrong is highly appreciated!
>
> Broker configurations is given below for reference :
>
> <broker xmlns="http://activemq.apache.org/schema/core" useJmx="false"
> brokerName="PrimaryBroker" deleteAllMessagesOnStartup="false"
> advisorySupport="false" schedulePeriodForDestinationPurge="600000"
> offlineDurableSubscriberTimeout="54000000"
> offlineDurableSubscriberTaskSchedule="3600000"
> dataDirectory="${activemq.data}">
>
>         <destinationPolicy>
>             <policyMap>
>               <policyEntries>
>                 <policyEntry topic=">" gcInactiveDestinations="true"
> inactiveTimoutBeforeGC="3600000">
>                   <pendingMessageLimitStrategy>
>                     <constantPendingMessageLimitStrategy limit="1000"/>
>                   </pendingMessageLimitStrategy>
>                   <deadLetterStrategy>
>                     <sharedDeadLetterStrategy processExpired="false" />
>                   </deadLetterStrategy>
>                 </policyEntry>
>                 <policyEntry queue=">" optimizedDispatch="true"
> reduceMemoryFootprint="true">
>                   <deadLetterStrategy>
>                     <sharedDeadLetterStrategy processExpired="false" />
>                   </deadLetterStrategy>
>                 </policyEntry>
>               </policyEntries>
>             </policyMap>
>         </destinationPolicy>
>
>         <persistenceAdapter>
>             <kahaDB directory="${activemq.data}/kahadb"
> indexCacheSize="20000" enableJournalDiskSyncs="false"
> ignoreMissingJournalfiles="true"/>
>         </persistenceAdapter>
>
>           <systemUsage>
>             <systemUsage sendFailIfNoSpaceAfterTimeout="10000">
>                 <memoryUsage>
>                     <memoryUsage percentOfJvmHeap="70" />
>                 </memoryUsage>
>                 <storeUsage>
>                     <storeUsage limit="100 gb"/>
>                 </storeUsage>
>                 <tempUsage>
>                     <tempUsage limit="50 gb"/>
>                 </tempUsage>
>             </systemUsage>
>         </systemUsage>
>
>         <transportConnectors>
>             <transportConnector name="openwire"
> uri="nio://0.0.0.0:61616?maximumConnections=1000&amp;wireFormat.
> maxInactivityDuration=180000&amp;wireFormat.maxFrameSize=104857600"/>
>             <transportConnector name="mqtt+nio"
> uri="mqtt+nio://0.0.0.0:1883?maximumConnections=50000&amp;wireFormat.
> maxInactivityDuration=180000&amp;wireFormat.maxFrameSize=104857600"/>
>         </transportConnectors>
>
>         <plugins>
>             <discardingDLQBrokerPlugin dropAll="true"
> dropTemporaryTopics="true" dropTemporaryQueues="true" />
>             <timeStampingBrokerPlugin ttlCeiling="43200000"
> zeroExpirationOverride="43200000"/>
>         </plugins>
>
> </broker>
>
> TIA,
> Shobhana
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/ActiveMQ-broker-becomes-unresponsive-after-
> sometime-tp4725278.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Tim Bain
Also, it strikes me that if you're cycling the broker every four hours
because of this behavior, you can probably afford the negligible
performance hit that comes with enabling JMX for the period of time before
you figure out what's going on.

And you should probably try to prove that there's actually a measurable
performance improvement by turning it off, because you're paying a high
price right now for not having it enabled, and I'm very skeptical that
you'll be able to tell the difference performance-wise. So I think this
might be a premature optimization.

Tim

On Apr 27, 2017 8:53 PM, "Tim Bain" <[hidden email]> wrote:

> Can you do all those things before the broker becomes unresponsive (e.g.
> after 2 or 3 hours)? If so, that might tell us something useful, and maybe
> you can script it to happen periodically (once a minute, for example) so
> you can see what things look like just before it becomes unresponsive.
>
> What do system monitoring tools like top tell you about which system
> resources (CPU, memory, disk IO, network IO) you are or are not using
> heavily?
>
> To me, some of these symptoms sound like maybe you're filling memory to
> the point that you're spending so much time GCing that the JVM doesn't have
> cycles to do much else. There are JVM arguments that will let you log the
> details of your GC activity to try to confirm or refute that theory (Google
> for them, or check the Oracle documentation). Others of the symptoms just
> sound really strange and I don't have any guesses about what they mean, but
> I assume you Googled for each one and found no useful results?
>
> Tim
>
> On Apr 27, 2017 8:17 AM, "Shobhana" <[hidden email]> wrote:
>
>> We use ActiveMQ 5.14.1 with KahaDB for persistence.
>> From last 2 days, we have observed that the broker becomes unresponsive
>> after running for around 4 hours! None of the operations work after this
>> including establishing new connections, publishing messages to topics,
>> subscribing to topics, unsubscribing, etc.
>>
>> We have disabled JMX for performance reasons; so I cannot check the
>> status/health of broker from JMX. I tried to take thread dump to see
>> what's
>> happening, but it fails with a message : Unable to open socket file:
>> target
>> process not responding or HotSpot VM not loaded!
>>
>> Similar error when I try to take heap dump! But I can see that broker
>> process is running using ps -ef |grep java option.
>>
>> Tried to take the thread dump forcefully using the -F option, but this
>> also
>> fails with "java.lang.RuntimeException: Unable to deduce type of thread
>> from
>> address 0x00007f9288012800 (expected type JavaThread, CompilerThread,
>> ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)"
>>
>> Forceful Heap dump fails with "Expecting GenCollectedHeap,
>> G1CollectedHeap,
>> or ParallelScavengeHeap, but got sun.jvm.hotspot.gc_interface.C
>> ollectedHeap"
>>
>> We have just one broker running on AWS EC2 Ubuntu instance. Broker is
>> started with Xmx of 12 GB. Our server and Android applications together
>> create thousands of topics and exchange MQTT messages (both persistent and
>> non-persistent). Within 4 hours, around 20 GB of journal files got created
>> in the last run before broker became unresponsive! The only way to
>> overcome
>> this problem is to stop the broker, delete all files in KahaDB and restart
>> the broker!
>>
>> Any hints to what could be going wrong is highly appreciated!
>>
>> Broker configurations is given below for reference :
>>
>> <broker xmlns="http://activemq.apache.org/schema/core" useJmx="false"
>> brokerName="PrimaryBroker" deleteAllMessagesOnStartup="false"
>> advisorySupport="false" schedulePeriodForDestinationPurge="600000"
>> offlineDurableSubscriberTimeout="54000000"
>> offlineDurableSubscriberTaskSchedule="3600000"
>> dataDirectory="${activemq.data}">
>>
>>         <destinationPolicy>
>>             <policyMap>
>>               <policyEntries>
>>                 <policyEntry topic=">" gcInactiveDestinations="true"
>> inactiveTimoutBeforeGC="3600000">
>>                   <pendingMessageLimitStrategy>
>>                     <constantPendingMessageLimitStrategy limit="1000"/>
>>                   </pendingMessageLimitStrategy>
>>                   <deadLetterStrategy>
>>                     <sharedDeadLetterStrategy processExpired="false" />
>>                   </deadLetterStrategy>
>>                 </policyEntry>
>>                 <policyEntry queue=">" optimizedDispatch="true"
>> reduceMemoryFootprint="true">
>>                   <deadLetterStrategy>
>>                     <sharedDeadLetterStrategy processExpired="false" />
>>                   </deadLetterStrategy>
>>                 </policyEntry>
>>               </policyEntries>
>>             </policyMap>
>>         </destinationPolicy>
>>
>>         <persistenceAdapter>
>>             <kahaDB directory="${activemq.data}/kahadb"
>> indexCacheSize="20000" enableJournalDiskSyncs="false"
>> ignoreMissingJournalfiles="true"/>
>>         </persistenceAdapter>
>>
>>           <systemUsage>
>>             <systemUsage sendFailIfNoSpaceAfterTimeout="10000">
>>                 <memoryUsage>
>>                     <memoryUsage percentOfJvmHeap="70" />
>>                 </memoryUsage>
>>                 <storeUsage>
>>                     <storeUsage limit="100 gb"/>
>>                 </storeUsage>
>>                 <tempUsage>
>>                     <tempUsage limit="50 gb"/>
>>                 </tempUsage>
>>             </systemUsage>
>>         </systemUsage>
>>
>>         <transportConnectors>
>>             <transportConnector name="openwire"
>> uri="nio://0.0.0.0:61616?maximumConnections=1000&amp;wireFor
>> mat.maxInactivityDuration=180000&amp;wireFormat.maxFrameSize=104857600"/>
>>             <transportConnector name="mqtt+nio"
>> uri="mqtt+nio://0.0.0.0:1883?maximumConnections=50000&amp;wi
>> reFormat.maxInactivityDuration=180000&amp;wireFormat.
>> maxFrameSize=104857600"/>
>>         </transportConnectors>
>>
>>         <plugins>
>>             <discardingDLQBrokerPlugin dropAll="true"
>> dropTemporaryTopics="true" dropTemporaryQueues="true" />
>>             <timeStampingBrokerPlugin ttlCeiling="43200000"
>> zeroExpirationOverride="43200000"/>
>>         </plugins>
>>
>> </broker>
>>
>> TIA,
>> Shobhana
>>
>>
>>
>> --
>> View this message in context: http://activemq.2283324.n4.nab
>> ble.com/ActiveMQ-broker-becomes-unresponsive-after-sometime-
>> tp4725278.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Shobhana
In reply to this post by Tim Bain
Thanks for your quick response Tim!

The top command yesterday showed the following after the broker became unresponsive :
a) CPU usage : 15 min avg was 1.2 cores
b) Memory % : ~35%
c) KiB Mem : 32948228 total, 25744576 used, 7203652 free, 51704 buffers

Today I'll capture necessary data periodically and observe for any differences. I'll also enable capturing GC logs and check for any hints from there.

Long back we had enabled JMX, but we had observed some scalability issues; when we checked heap dump we found many JMX related objects and hence disabled JMX.

If I cannot identify anything useful from data collected periodically or from GC logs, I'll enable JMX for sometime to check the health of broker.
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Tim Bain
I've never observed the JMX-related memory issues you described in the
brokers I've worked with (though they were all older: 5.10.x and before).
Maybe a bug was introduced in a recent version, or maybe something during
your testing looked more problematic than it actually was.

Your CPU usage doesn't look bad, and if you were stuck doing so much GCing
that the broker stopped responding, I think the CPU would be much higher
than that.  But hopefully you'll have more info from what you see in the GC
logs.

Unresponsiveness with low CPU utilization could indicate lock contention
(which would show up in a thread dump) or disk or network I/O (which you
can detect via system monitoring tools), so I'd look there if GC doesn't
appear to be the culprit once you get those logs in place.  The fact that
deleting the KahaDB files clears things up temporarily makes disk I/O a
prime candidate, but it could be any of the three (or something else
entirely).

Tim

On Thu, Apr 27, 2017 at 10:26 PM, Shobhana <[hidden email]> wrote:

> Thanks for your quick response Tim!
>
> The top command yesterday showed the following after the broker became
> unresponsive :
> a) CPU usage : 15 min avg was 1.2 cores
> b) Memory % : ~35%
> c) KiB Mem : 32948228 total, 25744576 used, 7203652 free, 51704 buffers
>
> Today I'll capture necessary data periodically and observe for any
> differences. I'll also enable capturing GC logs and check for any hints
> from
> there.
>
> Long back we had enabled JMX, but we had observed some scalability issues;
> when we checked heap dump we found many JMX related objects and hence
> disabled JMX.
>
> If I cannot identify anything useful from data collected periodically or
> from GC logs, I'll enable JMX for sometime to check the health of broker.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/ActiveMQ-broker-becomes-unresponsive-after-
> sometime-tp4725278p4725294.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Shobhana
I observed closely for last 3 days, but couldn't identify anything wrong with CPU, memory, Network I/O, Disk I/O, etc
a) CPU usage on this EC2 instance (has 8 vCPUs) has never crossed 35%
b) Memory usage varies between 1GB to 18 GB (Total available is 32 GB and Xmx assigned for this broker process is 26 GB)
c) Thread dumps don't show any blocked threads
d) Logs (enabled at INFO level) don't show any errors, except for occasional "Failed to remove inactive destination. Destination still has an active subscription". Is there any log to indicate producer flow control has kicked in?

In this morning run, there was 1 full GC (we use CMS GC) before the issue popped up and another full GC just 1 second after the issue. The first full GC took about 8.17 secs. Does this indicate any trouble?

I enabled JMX to check the no of messages pending. The JMX console just shows message statistics for each destination; However in my setup, there are thousands of topics created and thousands of durable subscribers. The JMX console couldn't even load all of them. So couldn't get how many messages were pending. Is there any other way to get total no of messages pending delivery?

Is our usage of ActiveMQ following any of the anti-patterns? We create thousands of connections, hundreds of thousands of topics and durable subscribers to exchange MQTT messages. Is this usage not recommended?
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Shobhana
In reply to this post by Tim Bain
From last 3 days' GC logs I observe something consistently :
When full GC happens which takes about 8 and odd seconds, 15 minutes later our servers loose connection to the broker;  all further attempts to reconnect fail with "Timeout" exception. New connection attempts also fail with same exception!

Why isn't the broker unable to recover after full GC?
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Morgan
In reply to this post by Shobhana
We have experienced this twice before. A complete reinstall of ActiveMq (fresh kahadb) resolved it. Never found the root cause.
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Shobhana
Is "Fresh KahaDB" equivalent to deleting the entire contents of KahaDB folder? If yes, we do this every early morning to prevent the issue, but it has never helped us!
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Tim Bain-2
In reply to this post by Shobhana
For now, let's focus on the garbage collection behavior. CMS has a
well-documented failure mode where fragmentation of the Old Gen space can
result in the JVM being unable to allocate a new large object even though
the heap isn't full. That might be what's going on here, or it might not be
related at all. But the fact that you have an 8-minute (!!!) full GC seems
like there might be something here.

JConsole will give you good details about the GC history and the breakdown
of the heap. Can you see if it shows heavy GC activity leading up to and
after the broker becomes unresponsive?

Also, you may want to try switching to G1GC, since it doesn't have CMS's
failure mode.

Tim

On May 18, 2017 2:09 AM, "Shobhana" <[hidden email]> wrote:

I observed closely for last 3 days, but couldn't identify anything wrong
with
CPU, memory, Network I/O, Disk I/O, etc
a) CPU usage on this EC2 instance (has 8 vCPUs) has never crossed 35%
b) Memory usage varies between 1GB to 18 GB (Total available is 32 GB and
Xmx assigned for this broker process is 26 GB)
c) Thread dumps don't show any blocked threads
d) Logs (enabled at INFO level) don't show any errors, except for occasional
"Failed to remove inactive destination. Destination still has an active
subscription". Is there any log to indicate producer flow control has kicked
in?

In this morning run, there was 1 full GC (we use CMS GC) before the issue
popped up and another full GC just 1 second after the issue. The first full
GC took about 8.17 secs. Does this indicate any trouble?

I enabled JMX to check the no of messages pending. The JMX console just
shows message statistics for each destination; However in my setup, there
are thousands of topics created and thousands of durable subscribers. The
JMX console couldn't even load all of them. So couldn't get how many
messages were pending. Is there any other way to get total no of messages
pending delivery?

Is our usage of ActiveMQ following any of the anti-patterns? We create
thousands of connections, hundreds of thousands of topics and durable
subscribers to exchange MQTT messages. Is this usage not recommended?




--
View this message in context: http://activemq.2283324.n4.
nabble.com/ActiveMQ-broker-becomes-unresponsive-after-
sometime-tp4725278p4726317.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Shobhana
Tim, full GC takes 8 seconds, not 8 minutes.

Also, after full GC, large amount of memory is reclaimed (13G to <2G) :

2017-05-17T14:01:46.179+0530: 34205.488: [Full GC2017-05-17T14:01:46.179+0530: 34205.488: [CMS: 13039360K->1895795K(26214400K), 8.5260340 secs] 13578056K->1895795K(27158144K), [CMS Perm : 33865K->33826K(131072K)], 8.5261390 secs] [Times: user=8.87 sys=0.00, real=8.52 secs]

Considering so much memory is freed up, do you think fragmentation would still be a cause for concern? I checked for size of few objects in histo report but couldn't find any object greater than 1.5KB.

Haven't used JConsole, but from GC logs I see just one or 2 full GCs before broker becomes unresponsive.

I'll also try by switching to G1GC and see if it makes any difference.
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Tim Bain
My brain swapped the units in your description, to an 8 minute GC with
unresponsiveness 15 seconds later. Sorry for the confusion.

8 seconds is long, but not unheard of, especially if the heap is large. G1
aims to shorten the Old Gen GC pause time by doing multiple small GCs of
Old Gen rather than a full GC (though it'll fall back to a full GC if the
incremental Old Gen collects don't work for some reason), so you you might
be happier with it if you care about pause times. But I'd have expected
lots and lots of full GCs if you were hitting CMS's failure mode, so GC no
longer sounds like our problem.

Can you use JVisualVM's CPU sampler (attach via JMX) to see what the broker
is spending its time on when you're in the unresponsive state? That might
give us more information.

Tim

On May 19, 2017 3:42 AM, "Shobhana" <[hidden email]> wrote:

> Tim, full GC takes 8 seconds, not 8 minutes.
>
> Also, after full GC, large amount of memory is reclaimed (13G to <2G) :
>
> 2017-05-17T14:01:46.179+0530: 34205.488: [Full
> GC2017-05-17T14:01:46.179+0530: 34205.488: [CMS:
> 13039360K->1895795K(26214400K), 8.5260340 secs]
> 13578056K->1895795K(27158144K), [CMS Perm : 33865K->33826K(131072K)],
> 8.5261390 secs] [Times: user=8.87 sys=0.00, real=8.52 secs]
>
> Considering so much memory is freed up, do you think fragmentation would
> still be a cause for concern? I checked for size of few objects in histo
> report but couldn't find any object greater than 1.5KB.
>
> Haven't used JConsole, but from GC logs I see just one or 2 full GCs before
> broker becomes unresponsive.
>
> I'll also try by switching to G1GC and see if it makes any difference.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/ActiveMQ-broker-becomes-unresponsive-after-
> sometime-tp4725278p4726390.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Shobhana
Hahah .. that's okay :-)

I have never used JVisualVM; will understand how to use this tool and report my observations. Thanks for all inputs so far!
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

nigro_franz
Hi!!!

In the last period I'm enjoying using this: http://gceasy.io/

You need to pass the following JVM arguments: -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:[file-path]

With [file-path] as the location where garbage collection log should be saved.
Then run your tests and upload the log to http://gceasy.io/: you'll have tons of graphs useful to get insights on the problem.

About G1: on Artemis we found that with a too big old generations G1 wasn't fast enough on Full GCs (are single threaded on it!) while using -XX:+UseParallelOldGC yes, so it worth to try it.

While If you have a Fedora machine wiht OpenJDK 8 or something similar, you have the option to use -XX:+UseShenandoahGC and find out if the problem is really a GC related one (do not use it in production yet!).
http://openjdk.java.net/jeps/189
If ShenandoahGC will solve the issue then it is very likely to be GC problem :)

Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

Tim Bain
Shenandoah is aimed at large JVMs (the page that link points to says 20GB
or bigger), so I wouldn't just go trying it without understanding what its
strengths and weaknesses are.

GCEasy looks interesting, but I'm curious how they make it free and whether
there are catches. Are you associated with the company?

Can you share more about your experience with G1? In particular, was the
slow behavior during incremental Old Gen collections, or during an actual
full GC? Theoretically, G1 is supposed to avoid full GCs by doing
incremental Old Gen GCs; are you saying that you weren't able to get it to
do that?

Tim


On May 19, 2017 8:41 AM, "nigro_franz" <[hidden email]> wrote:

Hi!!!

In the last period I'm enjoying using this:  http://gceasy.io/
<http://gceasy.io/>

You need to pass the following JVM arguments: -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Xloggc:[file-path]

With [file-path] as the location where garbage collection log should be
saved.
Then run your tests and upload the log to http://gceasy.io/: you'll have
tons of graphs useful to get insights on the problem.

About G1: on Artemis we found that with a too big old generations G1 wasn't
fast enough on Full GCs (are single threaded on it!) while using
-XX:+UseParallelOldGC yes, so it worth to try it.

While If you have a Fedora machine wiht OpenJDK 8 or something similar, you
have the option to use -XX:+UseShenandoahGC and find out if the problem is
really a GC related one (do not use it in production yet!).
http://openjdk.java.net/jeps/189
If ShenandoahGC will solve the issue then it is very likely to be GC problem
:)





--
View this message in context: http://activemq.2283324.n4.
nabble.com/ActiveMQ-broker-becomes-unresponsive-after-
sometime-tp4725278p4726398.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ broker becomes unresponsive after sometime

nigro_franz
This post was updated on .
Hi Tim!

Answers inlined

Tim Bain wrote
Shenandoah is aimed at large JVMs (the page that link points to says 20GB
or bigger), so I wouldn't just go trying it without understanding what its
strengths and weaknesses are.
Partially disagree, that link was only a fast overview of what Shenandoah is: as you rightly said, there are strengths and weaknesses beyond what a single page link could explain about a GC algorithm.
A better comparison of G1 vs Shenandoah is here: https://www.infoq.com/presentations/G1-Shenandoah

TLDR:
1) "mostly" G1 (region based, SATBs, uses write barriers etc etc) BUT with no remembered sets
2) no generational anymore: young won't die first!!
3) concurrent relocation using Brooks pointers (https://rkennke.wordpress.com/2013/10/23/shenandoah-gc-brooks-pointers/)  -> more memory overhead and lower throughput (eg read barriers, multiple copies)
4) won't pause while evacuate

IMHO the most important point that could make it a good match to be used to run a Java broker is 2: if you've an application in which you already know that queues/messages won't die soon (from a tenuring age pov), regardless to he planned heap size (=== could be not a huge heap) it makes sense to give it a try, taking in account the memory overhead involved.

The common idea that S, is will suited to manage huge heaps is just an added benefit related to not being anymore bounded by object copy/marking performances in minor GCs like G1.

The reasons why I've given the advice of giving it a try is mostly dependent by the fact that I'm assuming a very "busy" (from the copy operations pov) young generation or a very big old generation: like the Parallel GC, S. performs Full GCs using multiple threads (G1 not AFAIK), hence in the worst case that a Full GC has to happen I'm expecting it will be shorter than G1.

Tim Bain wrote
Are you associated with the company?
Not, I'm not, I've used it just to have a simple GC viewer to be used instead of the ones provided by other products.

Tim Bain wrote
Can you share more about your experience with G1? In particular, was the
slow behavior during incremental Old Gen collections, or during an actual
full GC? Theoretically, G1 is supposed to avoid full GCs by doing
incremental Old Gen GCs; are you saying that you weren't able to get it to
do that?
 
Yes I'll do for sure :)
it is just a matter to write something on this forum, but need time to make the results more than a simple collection of GC logs :)
As preview: with Artemis if you have a fast rate of message arrival (with no consumers), no expiration queues and you set the broker to retain them occupying the most of your old generation without leaving enough space to let the young generations (ie the fresh arrived messages) age in peace (ie having enough contiguous free space), G1 will likely run a lot of Full GCs.
In this very uncommon case (it is full of anti-pattern) a Parallel GC (or S.) is very likely to perform at least shorter Full GCs, mostly due to the uses of multiple threads while doing it.
Makes sense?

Regards,
Francesco