high GC activity yields a high percentage of dead letter queue messages.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

high GC activity yields a high percentage of dead letter queue messages.

Kevin Burton
I think I figured out a realistic theory as to why my broker died last week
and ended up with my entire queue moving to the dead letter queue.

The problem happens when ActiveMQ goes into a full-GC loop where it
continually spends 100% CPU doing full -GCs.

messages are still served, but only say 1% of them (because there’s no CPU
left to do much work).

The problem comes from sessions or client acks where you have to commit()
or ack a message manually.

This never happens.  Because the broker is spending all time doing GCs.

So the 1% of messages that leave, only have a 1% chance of being ack’d…
which means that 99% of messages that are served by the broker aren’t ack’d.

This keeps happening slowly and your entire queue will be drained with
messages going into the DLQ.

It would probably end up with around say a small percentage of your
messages being processed.  But the majority would end up in the DLQ.



--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: high GC activity yields a high percentage of dead letter queue messages.

Tim Bain
That sounds like a plausible explanation of why you DLQ filled in that
situation.  The more interesting question is what caused the GCing behavior.

I assume you don't have any GC logs/memory dumps to do forensics against,
so you're stuck trying to reproduce the problem so you can have something
to analyze?  If so, remember that you don't have to actually get to the
point of doing continual GCs (which might take weeks); it's enough to see
that the amount of Old Gen usage observed immediately after you force a
full GC (e.g. from the Memory tab of JConsole) increases between successive
full GCs over a non-trivial time period.  Then just analyze what's causing
the increase and that should be your answer.
On Dec 30, 2014 2:06 PM, "Kevin Burton" <[hidden email]> wrote:

> I think I figured out a realistic theory as to why my broker died last week
> and ended up with my entire queue moving to the dead letter queue.
>
> The problem happens when ActiveMQ goes into a full-GC loop where it
> continually spends 100% CPU doing full -GCs.
>
> messages are still served, but only say 1% of them (because there’s no CPU
> left to do much work).
>
> The problem comes from sessions or client acks where you have to commit()
> or ack a message manually.
>
> This never happens.  Because the broker is spending all time doing GCs.
>
> So the 1% of messages that leave, only have a 1% chance of being ack’d…
> which means that 99% of messages that are served by the broker aren’t
> ack’d.
>
> This keeps happening slowly and your entire queue will be drained with
> messages going into the DLQ.
>
> It would probably end up with around say a small percentage of your
> messages being processed.  But the majority would end up in the DLQ.
>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
Reply | Threaded
Open this post in threaded view
|

Re: high GC activity yields a high percentage of dead letter queue messages.

Kevin Burton
Agreed.  We’re still bringing up some instrumentation using
Grafana/KairosDB so my plan is to get some GC logs and analyze this over
time and look at memory as it grows.

If I can’t get that working in the next few months I might just use data
dog as I know they have a decent JVM memory interface.

Kevin

On Tue, Dec 30, 2014 at 1:42 PM, Tim Bain <[hidden email]> wrote:

> That sounds like a plausible explanation of why you DLQ filled in that
> situation.  The more interesting question is what caused the GCing
> behavior.
>
> I assume you don't have any GC logs/memory dumps to do forensics against,
> so you're stuck trying to reproduce the problem so you can have something
> to analyze?  If so, remember that you don't have to actually get to the
> point of doing continual GCs (which might take weeks); it's enough to see
> that the amount of Old Gen usage observed immediately after you force a
> full GC (e.g. from the Memory tab of JConsole) increases between successive
> full GCs over a non-trivial time period.  Then just analyze what's causing
> the increase and that should be your answer.
> On Dec 30, 2014 2:06 PM, "Kevin Burton" <[hidden email]> wrote:
>
> > I think I figured out a realistic theory as to why my broker died last
> week
> > and ended up with my entire queue moving to the dead letter queue.
> >
> > The problem happens when ActiveMQ goes into a full-GC loop where it
> > continually spends 100% CPU doing full -GCs.
> >
> > messages are still served, but only say 1% of them (because there’s no
> CPU
> > left to do much work).
> >
> > The problem comes from sessions or client acks where you have to commit()
> > or ack a message manually.
> >
> > This never happens.  Because the broker is spending all time doing GCs.
> >
> > So the 1% of messages that leave, only have a 1% chance of being ack’d…
> > which means that 99% of messages that are served by the broker aren’t
> > ack’d.
> >
> > This keeps happening slowly and your entire queue will be drained with
> > messages going into the DLQ.
> >
> > It would probably end up with around say a small percentage of your
> > messages being processed.  But the majority would end up in the DLQ.
> >
> >
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> > Location: *San Francisco, CA*
> > blog: http://burtonator.wordpress.com
> > … or check out my Google+ profile
> > <https://plus.google.com/102718274791889610666/posts>
> > <http://spinn3r.com>
> >
>



--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: high GC activity yields a high percentage of dead letter queue messages.

artnaseef
If the broker is getting into GC purgatory, as you describe, I wouldn't bother trying to figure out why its operation is sub-optimal - that's normal for any java application in GC purgatory.

Look for a memory leak or other errant operation that would explain the constant GC activity.

The results described - messages ending up in a DLQ - sounds like message TTLs are expiring because the broker is far too slow delivering the messages.
Reply | Threaded
Open this post in threaded view
|

Re: high GC activity yields a high percentage of dead letter queue messages.

Kevin Burton
Yes. I agree generally but I was worried about SOME other bug with the high
DLQ message count.  But if these two are coordinated then I only have to
deal with the GC purgatory issue.

On Sun, Jan 4, 2015 at 6:59 PM, artnaseef <[hidden email]> wrote:

> If the broker is getting into GC purgatory, as you describe, I wouldn't
> bother trying to figure out why its operation is sub-optimal - that's
> normal
> for any java application in GC purgatory.
>
> Look for a memory leak or other errant operation that would explain the
> constant GC activity.
>
> The results described - messages ending up in a DLQ - sounds like message
> TTLs are expiring because the broker is far too slow delivering the
> messages.
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/high-GC-activity-yields-a-high-percentage-of-dead-letter-queue-messages-tp4689344p4689415.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: high GC activity yields a high percentage of dead letter queue messages.

artnaseef
There could be another issue, but as long as the GC issue persists, it will make tracking down any other problem very challenging.  Attacking that first seems like the best approach.