Queue locks up , purging it allows it to work again.

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Queue locks up , purging it allows it to work again.

Kevin Burton
I’m totally stumped on this bug ….

Essentially, I have a queue that locks up and consumers in my main daemon
no longer consume messages from it.

It’s basically dead.  If I restart my daemon, no more messages are consumed.

I can browse the queue, consume them from my desktop, but I can’t consume
them from my main daemon.

I’ve done all the normal debugging. JMX shows there are plenty of messages
there.

Messages just aren’t being sent to the broker.

Here’s my major epiphany for the day, I can resolve this deadlock, but just
purging the queue.

No daemons needs to be restarted, etc. Just purging the queue is enough to
kick off message processing again.

Any idea what this could be? I’m willing to debug this, submit a patch,
etc. I’m just at a loss as to what could be happening.

I think it has something do to with the queue growing too large.  Shallow
queues work fine.  Deep queues lock up and stop processing work.

--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: Queue locks up , purging it allows it to work again.

Justin Reock-2
Have you tried forcing prefetch to 1 as a destination policy?

-Justin


On Apr 19, 2015 8:15 PM, Kevin Burton <[hidden email]> wrote:
I’m totally stumped on this bug ….

Essentially, I have a queue that locks up and consumers in my main daemon
no longer consume messages from it.

It’s basically dead.  If I restart my daemon, no more messages are consumed.

I can browse the queue, consume them from my desktop, but I can’t consume
them from my main daemon.

I’ve done all the normal debugging. JMX shows there are plenty of messages
there.

Messages just aren’t being sent to the broker.

Here’s my major epiphany for the day, I can resolve this deadlock, but just
purging the queue.

No daemons needs to be restarted, etc. Just purging the queue is enough to
kick off message processing again.

Any idea what this could be? I’m willing to debug this, submit a patch,
etc. I’m just at a loss as to what could be happening.

I think it has something do to with the queue growing too large.  Shallow
queues work fine.  Deep queues lock up and stop processing work.

--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: Queue locks up , purging it allows it to work again.

paulgale
In reply to this post by Kevin Burton
What version of ActiveMQ are you using? Please send the contents of you
activemq.xml file plus details of your producer consumer and how they're
implemented.
Have you set the broker's logging to TRACE level prior to running your
experiments? If so, please attach or use pastebin.

Thanks,
Paul

On Sun, Apr 19, 2015 at 8:14 PM, Kevin Burton <[hidden email]> wrote:

> I'm totally stumped on this bug ....
>
> Essentially, I have a queue that locks up and consumers in my main daemon
> no longer consume messages from it.
>
> It's basically dead.  If I restart my daemon, no more messages are
> consumed.
>
> I can browse the queue, consume them from my desktop, but I can't consume
> them from my main daemon.
>
> I've done all the normal debugging. JMX shows there are plenty of messages
> there.
>
> Messages just aren't being sent to the broker.
>
> Here's my major epiphany for the day, I can resolve this deadlock, but just
> purging the queue.
>
> No daemons needs to be restarted, etc. Just purging the queue is enough to
> kick off message processing again.
>
> Any idea what this could be? I'm willing to debug this, submit a patch,
> etc. I'm just at a loss as to what could be happening.
>
> I think it has something do to with the queue growing too large.  Shallow
> queues work fine.  Deep queues lock up and stop processing work.
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> ... or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
Reply | Threaded
Open this post in threaded view
|

Re: Queue locks up , purging it allows it to work again.

Kevin Burton
In reply to this post by Justin Reock-2
Interesting.  It’s already 1 in the connection configuration.  I assume you
mean queuePrefetch as it’s named differently in the destination policy.

On Sun, Apr 19, 2015 at 5:42 PM, Justin Reock <[hidden email]>
wrote:

> Have you tried forcing prefetch to 1 as a destination policy?
>
> -Justin
>
>
> On Apr 19, 2015 8:15 PM, Kevin Burton <[hidden email]> wrote:
> I’m totally stumped on this bug ….
>
> Essentially, I have a queue that locks up and consumers in my main daemon
> no longer consume messages from it.
>
> It’s basically dead.  If I restart my daemon, no more messages are
> consumed.
>
> I can browse the queue, consume them from my desktop, but I can’t consume
> them from my main daemon.
>
> I’ve done all the normal debugging. JMX shows there are plenty of messages
> there.
>
> Messages just aren’t being sent to the broker.
>
> Here’s my major epiphany for the day, I can resolve this deadlock, but just
> purging the queue.
>
> No daemons needs to be restarted, etc. Just purging the queue is enough to
> kick off message processing again.
>
> Any idea what this could be? I’m willing to debug this, submit a patch,
> etc. I’m just at a loss as to what could be happening.
>
> I think it has something do to with the queue growing too large.  Shallow
> queues work fine.  Deep queues lock up and stop processing work.
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>



--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: Queue locks up , purging it allows it to work again.

Geoffrey Arnold-2
Are you reaching your diskUsage threshold (see systemUsage)?

On Sunday, April 19, 2015, Kevin Burton <[hidden email]> wrote:

> Interesting.  It’s already 1 in the connection configuration.  I assume you
> mean queuePrefetch as it’s named differently in the destination policy.
>
> On Sun, Apr 19, 2015 at 5:42 PM, Justin Reock <[hidden email]
> <javascript:;>>
> wrote:
>
> > Have you tried forcing prefetch to 1 as a destination policy?
> >
> > -Justin
> >
> >
> > On Apr 19, 2015 8:15 PM, Kevin Burton <[hidden email] <javascript:;>>
> wrote:
> > I’m totally stumped on this bug ….
> >
> > Essentially, I have a queue that locks up and consumers in my main daemon
> > no longer consume messages from it.
> >
> > It’s basically dead.  If I restart my daemon, no more messages are
> > consumed.
> >
> > I can browse the queue, consume them from my desktop, but I can’t consume
> > them from my main daemon.
> >
> > I’ve done all the normal debugging. JMX shows there are plenty of
> messages
> > there.
> >
> > Messages just aren’t being sent to the broker.
> >
> > Here’s my major epiphany for the day, I can resolve this deadlock, but
> just
> > purging the queue.
> >
> > No daemons needs to be restarted, etc. Just purging the queue is enough
> to
> > kick off message processing again.
> >
> > Any idea what this could be? I’m willing to debug this, submit a patch,
> > etc. I’m just at a loss as to what could be happening.
> >
> > I think it has something do to with the queue growing too large.  Shallow
> > queues work fine.  Deep queues lock up and stop processing work.
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> > Location: *San Francisco, CA*
> > blog: http://burtonator.wordpress.com
> > … or check out my Google+ profile
> > <https://plus.google.com/102718274791889610666/posts>
> > <http://spinn3r.com>
> >
>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
Reply | Threaded
Open this post in threaded view
|

Re: Queue locks up , purging it allows it to work again.

Kevin Burton
In reply to this post by paulgale
Here’s the public gist of our XML config.  (it needs some comment cleanup
but that’s that we’re running with).

https://gist.github.com/burtonator/b5f4228b0f0acbf05b4e

We’re running 5.10.2 .  I’ve reviewed the bugs fixed since then and nothing
seems to apply to our situation. I would upgrade but we run with the memory
persistence adapter and there’s a bug that was introduced in the 11.x
series which impacts memory persistence and advisories.

And I haven’t had time to fix that.

We also have a fix applied that I developed to release lock contention
during queue GC.  It’s within the realm of probability that this could have
introduced a bug but I feel VERY confident that this is not the issue.  We
contributed it back and there’s an outstanding pull request for this on the
5.10 and 5.11 branches but since I didn’t apply it against master it was
never merged :-(

I didn’t TRACE activemq but I can do this now.  Should I do this on the
broker too?  the problem is that this is going to result in a VERY large
log file because it takes about 30 minutes to reproduce this.

In the mean time, I’m going to write a test to see if I can reproduce this
by creating a similar situation of high load and a backlogged queue.

On Sun, Apr 19, 2015 at 5:58 PM, Paul Gale <[hidden email]> wrote:

> What version of ActiveMQ are you using? Please send the contents of you
> activemq.xml file plus details of your producer consumer and how they're
> implemented.
> Have you set the broker's logging to TRACE level prior to running your
> experiments? If so, please attach or use pastebin.
>
--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: Queue locks up , purging it allows it to work again.

Kevin Burton
Also, I”ve run with and without producer flow control and that also doesn’t
impact the situation.

On Sun, Apr 19, 2015 at 8:01 PM, Kevin Burton <[hidden email]> wrote:

> Here’s the public gist of our XML config.  (it needs some comment cleanup
> but that’s that we’re running with).
>
> https://gist.github.com/burtonator/b5f4228b0f0acbf05b4e
>
> We’re running 5.10.2 .  I’ve reviewed the bugs fixed since then and
> nothing seems to apply to our situation. I would upgrade but we run with
> the memory persistence adapter and there’s a bug that was introduced in the
> 11.x series which impacts memory persistence and advisories.
>
> And I haven’t had time to fix that.
>
> We also have a fix applied that I developed to release lock contention
> during queue GC.  It’s within the realm of probability that this could have
> introduced a bug but I feel VERY confident that this is not the issue.  We
> contributed it back and there’s an outstanding pull request for this on the
> 5.10 and 5.11 branches but since I didn’t apply it against master it was
> never merged :-(
>
> I didn’t TRACE activemq but I can do this now.  Should I do this on the
> broker too?  the problem is that this is going to result in a VERY large
> log file because it takes about 30 minutes to reproduce this.
>
> In the mean time, I’m going to write a test to see if I can reproduce this
> by creating a similar situation of high load and a backlogged queue.
>
> On Sun, Apr 19, 2015 at 5:58 PM, Paul Gale <[hidden email]> wrote:
>
>> What version of ActiveMQ are you using? Please send the contents of you
>> activemq.xml file plus details of your producer consumer and how they're
>> implemented.
>> Have you set the broker's logging to TRACE level prior to running your
>> experiments? If so, please attach or use pastebin.
>>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>


--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: Queue locks up , purging it allows it to work again.

Kevin Burton
In reply to this post by Geoffrey Arnold-2
No where near it.. we’re running in memory and the values are much higher
than this… I might try to disable it though… just as an experiment.

On Sun, Apr 19, 2015 at 7:39 PM, Geoffrey Arnold <[hidden email]>
wrote:

> Are you reaching your diskUsage threshold (see systemUsage)?
>
> On Sunday, April 19, 2015, Kevin Burton <[hidden email]> wrote:
>
> > Interesting.  It’s already 1 in the connection configuration.  I assume
> you
> > mean queuePrefetch as it’s named differently in the destination policy.
> >
> > On Sun, Apr 19, 2015 at 5:42 PM, Justin Reock <
> [hidden email]
> > <javascript:;>>
> > wrote:
> >
> > > Have you tried forcing prefetch to 1 as a destination policy?
> > >
> > > -Justin
> > >
> > >
> > > On Apr 19, 2015 8:15 PM, Kevin Burton <[hidden email]
> <javascript:;>>
> > wrote:
> > > I’m totally stumped on this bug ….
> > >
> > > Essentially, I have a queue that locks up and consumers in my main
> daemon
> > > no longer consume messages from it.
> > >
> > > It’s basically dead.  If I restart my daemon, no more messages are
> > > consumed.
> > >
> > > I can browse the queue, consume them from my desktop, but I can’t
> consume
> > > them from my main daemon.
> > >
> > > I’ve done all the normal debugging. JMX shows there are plenty of
> > messages
> > > there.
> > >
> > > Messages just aren’t being sent to the broker.
> > >
> > > Here’s my major epiphany for the day, I can resolve this deadlock, but
> > just
> > > purging the queue.
> > >
> > > No daemons needs to be restarted, etc. Just purging the queue is enough
> > to
> > > kick off message processing again.
> > >
> > > Any idea what this could be? I’m willing to debug this, submit a patch,
> > > etc. I’m just at a loss as to what could be happening.
> > >
> > > I think it has something do to with the queue growing too large.
> Shallow
> > > queues work fine.  Deep queues lock up and stop processing work.
> > >
> > > --
> > >
> > > Founder/CEO Spinn3r.com
> > > Location: *San Francisco, CA*
> > > blog: http://burtonator.wordpress.com
> > > … or check out my Google+ profile
> > > <https://plus.google.com/102718274791889610666/posts>
> > > <http://spinn3r.com>
> > >
> >
> >
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> > Location: *San Francisco, CA*
> > blog: http://burtonator.wordpress.com
> > … or check out my Google+ profile
> > <https://plus.google.com/102718274791889610666/posts>
> > <http://spinn3r.com>
> >
>



--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: Queue locks up , purging it allows it to work again.

Kevin Burton
In reply to this post by Kevin Burton
Also, I was thinking that this MIGHT be a bug with unfair scheduling.

synchronized and read/write locks aren’t fair.

So it’s entirely possible that the client is scheduling work on on faster
queue because they reply quicker and thus they win the lock race.

This would explain why I can read from the queue server from a dedicated
process, but my main worker daemon, which is rather loaded can’t process
any more work.

My boxes aren’t loaded THAT bad.  Maybe about 50% CPU 24/7.. but it’s a lot
of activemq work. So it’s entirely possible that the client is continually
losing a race working with the larger queues.

On Sun, Apr 19, 2015 at 8:03 PM, Kevin Burton <[hidden email]> wrote:

> Also, I”ve run with and without producer flow control and that also
> doesn’t impact the situation.
>
> On Sun, Apr 19, 2015 at 8:01 PM, Kevin Burton <[hidden email]> wrote:
>
>> Here’s the public gist of our XML config.  (it needs some comment cleanup
>> but that’s that we’re running with).
>>
>> https://gist.github.com/burtonator/b5f4228b0f0acbf05b4e
>>
>> We’re running 5.10.2 .  I’ve reviewed the bugs fixed since then and
>> nothing seems to apply to our situation. I would upgrade but we run with
>> the memory persistence adapter and there’s a bug that was introduced in the
>> 11.x series which impacts memory persistence and advisories.
>>
>> And I haven’t had time to fix that.
>>
>> We also have a fix applied that I developed to release lock contention
>> during queue GC.  It’s within the realm of probability that this could have
>> introduced a bug but I feel VERY confident that this is not the issue.  We
>> contributed it back and there’s an outstanding pull request for this on the
>> 5.10 and 5.11 branches but since I didn’t apply it against master it was
>> never merged :-(
>>
>> I didn’t TRACE activemq but I can do this now.  Should I do this on the
>> broker too?  the problem is that this is going to result in a VERY large
>> log file because it takes about 30 minutes to reproduce this.
>>
>> In the mean time, I’m going to write a test to see if I can reproduce
>> this by creating a similar situation of high load and a backlogged queue.
>>
>> On Sun, Apr 19, 2015 at 5:58 PM, Paul Gale <[hidden email]> wrote:
>>
>>> What version of ActiveMQ are you using? Please send the contents of you
>>> activemq.xml file plus details of your producer consumer and how they're
>>> implemented.
>>> Have you set the broker's logging to TRACE level prior to running your
>>> experiments? If so, please attach or use pastebin.
>>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>


--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>