Quantcast

Artems: ParallelGC vs CMS

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Artems: ParallelGC vs CMS

abhijith
Hi,

Has there any test done for Artemis using CMS GC?  

Context:  We were using default parallel GC with 6G RAM and were facing an issue where old gen was filling up without full GC being run.  We use Netty Transport configuration and have one master/slave setup with static configuration.  Our monitoring alerts for total memory usage were firing.  So we ran some tests and found that old gen was growing very slowly in case of CMS, but very rapidly in case of parallel GC.

We switched to using CMS, but after changing to CMS we started seeing connection errors when session factory is trying to create a new connection.  This was happening consistently which forced us to roll back.  After rollback with parallel gc we do not see this issue at all.  We are kind of confused on how selection of GC can affect creating new connections.  Any insight would be really helpful.

GG parameters:  "-XX:+UseFastAccessorMethods -Xms8G -Xmx8G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSIncrementalDutyCycleMin=10 -XX:CMSIncrementalDutyCycle=20 -XX:+CMSIncrementalPacing -XX:+CMSClassUnloadingEnabled "

Caused by: org.apache.activemq.artemis.api.core.ActiveMQIllegalStateException: AMQ119024: Could not select a TransportConfiguration to create SessionFactory
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:773) ~[artemis-core-client-1.4.0.jar:1.4.0]
        at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnectionInternal(ActiveMQConnectionFactory.java:756) ~[artemis-jms-client-1.4.0.jar:1.4.0]
        ... 122 common frames omitted

Thanks
Abhi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Artems: ParallelGC vs CMS

nigro_franz
Hi!

Just as a chrioisty, do you have tried G1 too?

Il mar 7 feb 2017, 22:07 abhijith <[hidden email]> ha scritto:

> Hi,
>
> Has there any test done for Artemis using CMS GC?
>
> Context:  We were using default parallel GC with 6G RAM and were facing an
> issue where old gen was filling up without full GC being run.  We use Netty
> Transport configuration and have one master/slave setup with static
> configuration.  Our monitoring alerts for total memory usage were firing.
> So we ran some tests and found that old gen was growing very slowly in case
> of CMS, but very rapidly in case of parallel GC.
>
> We switched to using CMS, but after changing to CMS we started seeing
> connection errors when session factory is trying to create a new
> connection.
> This was happening consistently which forced us to roll back.  After
> rollback with parallel gc we do not see this issue at all.  We are kind of
> confused on how selection of GC can affect creating new connections.  Any
> insight would be really helpful.
>
> GG parameters:  "-XX:+UseFastAccessorMethods -Xms8G -Xmx8G
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> -XX:CMSIncrementalDutyCycleMin=10 -XX:CMSIncrementalDutyCycle=20
> -XX:+CMSIncrementalPacing -XX:+CMSClassUnloadingEnabled "
>
> Caused by:
> org.apache.activemq.artemis.api.core.ActiveMQIllegalStateException:
> AMQ119024: Could not select a TransportConfiguration to create
> SessionFactory
>         at
>
> org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:773)
> ~[artemis-core-client-1.4.0.jar:1.4.0]
>         at
>
> org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnectionInternal(ActiveMQConnectionFactory.java:756)
> ~[artemis-jms-client-1.4.0.jar:1.4.0]
>         ... 122 common frames omitted
>
> Thanks
> Abhi
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Artems-ParallelGC-vs-CMS-tp4721722.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Artems: ParallelGC vs CMS

abhijith
Yes, we tested with G1 settings in load test environment. Old gen was still increasing with it albeit a bit slower than parallel GC, so we never moved ahead with it.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Artems: ParallelGC vs CMS

nigro_franz
Just a few ideas...
CMS is not a compacting collector, which can result in old generation fragmentation: if there will be any memory leaks in the connection's resources that makes the old generation filled with fragmented garbage, a FullGC (with no results due to the leak!) is triggered to compact the live tenured objects and a “promotion failed” message will be logged.
Hence, having any GC logs and jmap (or using any profiler) heap histograms in different moments during the application lifetime (before and after a GC would be great) could help to know if the memory leaks theory is valid.

Leaving alone the memory leaks theory, maybe the CMS collector is not "fast" enough to keep the pace,hence it could make sense to help it to be more "proactive" while starting to concurrent collect garbage (by default the threshold to start a concurrent collection is 70%!): try providing more headroom (1 GB more!) in OldGen (using -XX:NewRatio) and lower the threshold to start a concurrent collection with
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50
With this configuration CMS could be able to collect all the garbage in time to let happen further promotions (and new connections too:) ).

About the throughput collector (default parallel GC):
old gen was filling up without full GC being run
Do you mean that the old gen is full and no Full GCs happens?
AFAIK when the old gen is filled up of "real" resources (ie can't be collected) and a new promotion/allocation will occurs you would have a GC log similar to this:
[GC (Allocation Failure)  2661502K->2638903K(2715648K), 0.0127080 secs]
[Full GC (Ergonomics)  2638903K->2638545K(2715648K), 0.0996147 secs]
[GC (Allocation Failure)  2638545K->2638545K(2715648K), 0.0158627 secs]
[Full GC (Allocation Failure) Exception in thread "main"  2638545K->2638180K(2715648K), 0.0329988 secs]
java.lang.OutOfMemoryError: Java heap space
It is strange the GC doesn't happen at all or I've missed something?

Anyway, having a test case or Heap/GC stats could be really useful to help to understand what is happened, there really too many cases to be covered!


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Artems: ParallelGC vs CMS

Richard Wagg
In reply to this post by abhijith
CMS should not be used in incremental mode for any modern hardware  - see https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html.


Try the below, and take a look at what the GC log output shows?
-Xms8G -Xmx8G -XX:+UseConcMarkSweepGC  -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:MaxTenuringThreshold=8 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/apps/activemq/

Extra args for GC logging
-XX:+PrintGCCause -XX:+PrintGCDetails  -XX:+PrintTenuringDistribution  -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/var/log/apps/activemq/gclog.log


-----Original Message-----
From: abhijith [mailto:[hidden email]]
Sent: 07 February 2017 18:53
To: [hidden email]
Subject: Artems: ParallelGC vs CMS

Hi,

Has there any test done for Artemis using CMS GC?

Context:  We were using default parallel GC with 6G RAM and were facing an issue where old gen was filling up without full GC being run.  We use Netty Transport configuration and have one master/slave setup with static configuration.  Our monitoring alerts for total memory usage were firing.
So we ran some tests and found that old gen was growing very slowly in case of CMS, but very rapidly in case of parallel GC.

We switched to using CMS, but after changing to CMS we started seeing connection errors when session factory is trying to create a new connection.
This was happening consistently which forced us to roll back.  After rollback with parallel gc we do not see this issue at all.  We are kind of confused on how selection of GC can affect creating new connections.  Any insight would be really helpful.

GG parameters:  "-XX:+UseFastAccessorMethods -Xms8G -Xmx8G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:CMSIncrementalDutyCycleMin=10 -XX:CMSIncrementalDutyCycle=20 -XX:+CMSIncrementalPacing -XX:+CMSClassUnloadingEnabled "

Caused by:
org.apache.activemq.artemis.api.core.ActiveMQIllegalStateException:
AMQ119024: Could not select a TransportConfiguration to create SessionFactory
        at
org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:773)
~[artemis-core-client-1.4.0.jar:1.4.0]
        at
org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnectionInternal(ActiveMQConnectionFactory.java:756)
~[artemis-jms-client-1.4.0.jar:1.4.0]
        ... 122 common frames omitted

Thanks
Abhi



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artems-ParallelGC-vs-CMS-tp4721722.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
________________________________


Spread bets and Contracts for Difference ("CFDs") are leveraged products and carry a high level of risk to your capital as prices may move rapidly against you. Losses can exceed your deposits and you may be required to make further payments. Binaries and Countdowns carry a level of risk to your capital as you could lose all of your investment. Invest only what you can afford to lose. These products may not be suitable for all clients therefore ensure you understand the risks and seek independent advice.

CMC Markets UK plc and CMC Spreadbet plc are authorised and regulated by the Financial Conduct Authority in the United Kingdom except for the provision of Countdowns for which CMC Markets is licensed and regulated by the Gambling Commission, reference 42013.

The content of this e-mail (including any attachments) is strictly confidential and is for the sole use of the intended addressee(s). If you are not the intended recipient of this e-mail please notify the sender immediately and delete this e-mail from your system. Any disclosure, copying, dissemination or use of its content (including any attachments) is strictly prohibited. CMC Markets UK plc and CMC Spreadbet plc reserve the right to intercept and monitor the content of the e-mail messages to and from its systems.

E-mails may be interfered with or may contain viruses or other defects for which CMC Markets UK plc and CMC Spreadbet plc accept no responsibility. It is the responsibility of the recipient to carry out a virus check on the e-mail and any attachment(s).

This communication is not intended as an offer or solicitation for the purchase or sale of a financial instrument or as an official confirmation of any transaction unless specifically presented as such.

CMC Markets UK plc and CMC Spreadbet plc are registered in England and Wales with Company Numbers 02448409 and 02589529 and with their registered offices at 133 Houndsditch, London, EC3A 7BX.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Artems: ParallelGC vs CMS

nigro_franz

I didn't know the "incremental mode" is being deprecated, great!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Artems: ParallelGC vs CMS

Tim Bain
I second the suggestion about G1GC; if you would consider running CMS
(which is a hand grenade, and should be used only once you understand the
risk of Old Gen fragmentation and how it could result in an OOM even though
there is lots of RAM available), you should be considering G1 as well (and
since G1 doesn't have CMS's failure risk, you can safely try G1 without
needing to understand and accept scary risks).

However, based on the quick description you gave, it sounds like you might
have invalid monitoring that is resulting in false positives. Filling
memory (or even coming close) is not an error condition; what is, is having
memory still full immediately after performing a full GC. The latter
indicates that memory is full of *live objects* and therefore there is
little or nothing available for use by the JVM; nothing else matters. So if
your monitoring is considering memory usage without considering when the
last full GC happened, it's not a valid check. (Unless your check threshold
is higher than the full GC threshold, of course.)

Tim

On Feb 8, 2017 4:18 AM, "nigro_franz" <[hidden email]> wrote:

>
> I didn't know the "incremental mode" is being deprecated, great!
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Artems-ParallelGC-vs-CMS-tp4721722p4721748.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Artems: ParallelGC vs CMS

Tim Bain
In reply to this post by abhijith
Did Old Gen continually increase *after* performing full GCs? That is, was
the Old Gen usage after full GC #N always greater than the Old Gen usage
after full GC #N+1? If not, you were concerned about something that doesn't
matter.

It's OK for Old Gen to grow; that's why we have a garbage collector. It
would be better if the memory got GCed out of Young Gen, but it's OK if it
makes it to Old Gen and is collected from there. Where it's a problem is
when Old Gen doesn't shrink back down after each full GC, because that
means there is more and more memory actually used, which will eventually
run you out of memory if left unchecked. But anything else is not a reason
not to use a particular product such as Artemis or a particular GC
strategy.

One other note: G1 is capable of GCing Old Gen incrementally without a full
GC. In testing on ActiveMQ 5.10.0 on an early-ish Hotspot 8 JVM (it might
have been 8u5 or 8u21, though I no longer remember exactly which one), I
observed the incremental Old Gen collects to not collect objects that would
be successfully collected in the next full GC. So it's crucial that you
only consider the state of Old Gen following a full GC, not just an
incremental Old Gen collect, which is why I kept explicitly calling out
full GCs above.

Tim

On Feb 8, 2017 8:34 AM, "abhijith" <[hidden email]> wrote:

Yes, we tested with G1 settings in load test environment. Old gen was still
increasing with it albeit a bit slower than parallel GC, so we never moved
ahead with it.



--
View this message in context: http://activemq.2283324.n4.
nabble.com/Artems-ParallelGC-vs-CMS-tp4721722p4721737.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Loading...