Consumers hanging on a queue although there are messages in it

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Consumers hanging on a queue although there are messages in it

Haris
Hi everyone! Unfortunately, my first message here is about a problem. I have the following scenario: In an application, ActiveMQ (v5.13.1) is used as a communication channel between a large number of nodes (around 50). One queue is central, and through it all messages pass, they get process by 2 nodes, and then forwarded to other queues. Results of processing of those messages are then again sent back to the central queue. All messages are persistent as well. All nodes use Spring JMS to receive and send messages to all queues. The DefaultMessageListenerContainer in them are configured with concurrency ~ 50. All the queues have roughly the equal number of consumers. The central queue has most throughput (a few 100K of messages per day). Only 2 nodes (roughly 100 threads, i.e. 100 sessions) are consuming from it, and sending messages to it, and in addition to that all the other nodes send messages from it (at a lower rate though). For some time already, we see consumption rate from the central queue halting almost completely. There are messages in the queue, but our nodes are not getting any of those. When looked into JVM stacks, most were these:
java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at org.apache.activemq.FifoMessageDispatchChannel.dequeue(FifoMessageDispatchChannel.java:72)
	- locked <0x000000072c0a6090> (a java.lang.Object)
	at org.apache.activemq.ActiveMQMessageConsumer.dequeue(ActiveMQMessageConsumer.java:482)
	at org.apache.activemq.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:633)
	at org.apache.activemq.jms.pool.PooledMessageConsumer.receive(PooledMessageConsumer.java:67)
	at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:413)
	at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:293)
	at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:246)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1142)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1134)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1031)
	at java.lang.Thread.run(Thread.java:745)
and these:
java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000731356870> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
	at org.apache.activemq.transport.FutureResponse.getResult(FutureResponse.java:40)
	at org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:87)
	at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1409)
	at org.apache.activemq.TransactionContext.syncSendPacketWithInterruptionHandling(TransactionContext.java:761)
	at org.apache.activemq.TransactionContext.commit(TransactionContext.java:327)
	at org.apache.activemq.ActiveMQSession.commit(ActiveMQSession.java:574)
	at org.apache.activemq.jms.pool.PooledSession.commit(PooledSession.java:167)
	at org.springframework.jms.support.JmsUtils.commitIfNecessary(JmsUtils.java:217)
	at org.springframework.jms.listener.AbstractMessageListenerContainer.commitIfNecessary(AbstractMessageListenerContainer.java:700)
	at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:605)
	at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:308)
	at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:246)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1142)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1134)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1031)
	at java.lang.Thread.run(Thread.java:745)
Also, it happened that ActiveMQ restarted itself. From some answers from internet, a possible reason can be that AMQ's JVM was hanging, so the "starter" process had to restart it. I would imagine that the client code needs to be configured, however, there are 2 consumer nodes of this central queue, and BOTH are waiting for messages, excluding any kind of mutual locking, and possibly indicating an issue with how ActiveMQ and/or KahaDB handle this situation. Any kind of advise as to how to solve this problem is appreciated! Thanks, Haris
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Haris
Here is something more, really weird: Almost always, the following dirty hack "unblocked" the consumers: refreshing the queue browser page, i.e. http://localhost:8161/admin/queues.jsp
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Stefanic
Haris wrote
> Here is something more, really weird: Almost always, the following dirty
> hack "unblocked" the consumers: refreshing the queue browser page, i.e.
> http://localhost:8161/admin/queues.jsp

Hi Haris,

Did you ever find out what caused this behavior?

We recently upgraded both the ActiveMQ server and the ActiveMQ (and Camel)
client library and this problem has suddenly popped up for the first time..

We went from ActiveMQ server 5.11.1 to 5.15.2 and client library 5.12.2 to
5.15.2.

It does look like a problem in the server because restarting the client
results in the exact same behavior, and when opening the queue browser it is
resolved immediately.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Tim Bain
Are any selectors in use on the consumers of the queue that's getting
"blocked"? If so, are there any messages published that don't match any of
the selectors?

If you take a thread dump on the broker, where do you see threads waiting?

Can you reproduce this on demand in a test environment with a minimum setup
(one queue, one or two consumers, a known, repeatable set of messages), or
is this the type of problem that only occurs in ops after weeks or months
of heavy usage? Hopefully it's the former, because the latter problem is
usually very hard to diagnose.

Tim

On Jan 22, 2018 4:56 AM, "Stefanic" <[hidden email]> wrote:

> Haris wrote
> > Here is something more, really weird: Almost always, the following dirty
> > hack "unblocked" the consumers: refreshing the queue browser page, i.e.
> > http://localhost:8161/admin/queues.jsp
>
> Hi Haris,
>
> Did you ever find out what caused this behavior?
>
> We recently upgraded both the ActiveMQ server and the ActiveMQ (and Camel)
> client library and this problem has suddenly popped up for the first time..
>
> We went from ActiveMQ server 5.11.1 to 5.15.2 and client library 5.12.2 to
> 5.15.2.
>
> It does look like a problem in the server because restarting the client
> results in the exact same behavior, and when opening the queue browser it
> is
> resolved immediately.
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Haris
In reply to this post by Stefanic
Hi Stefanic,

Unfortunately not. It appeared again, and then went away. It /looks like/
it's related to a "lot of activity" on a queue: lots of messages and/or lots
of consumer (but more probably the first: lots of messages). However, the
root cause we haven't been able to find. Especially since we have faced
another problem, and that are restarts every now and then without a clear
reason.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Stefanic
In reply to this post by Tim Bain
Hi Tim,

There are no selectors, so there are no messages left behind.
We have seen this three times now in production and every time when the
queue is empty (after about 30 minutes of refreshing) the problem goes away.

Threads behavior is rather strange, when the blocking issue is on-going we
see 1 thread constantly running (100%) in visualvm, and all other threads
(31 in our case) almost seem synchronized on the same object because all of
them start running at the same time and go into timed_waiting state at the
same time.
That results in 33% running of all other threads and that is not enough to
keep our queue empty.

Reproducing this behavior is nearly impossible for us, it occurs randomly
within about a week.

Here are the things we tried so far:
- Downgraded ActiveMQ broker from 5.15.2 to our previous version 5.11.1 but
the problem occurred again on that version so we assume it is not the
broker/server
- Yesterday we downgraded both ActiveMQ and Camel client libraries: ActiveMQ
down to 5.14.5 and Camel from 2.20.1 down to 2.19.4 (I would have rather
only downgraded 1 at a time but it's our production environment so cannot
play around too much)

I will reply here if we experience the blocked state again.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Stefanic
In reply to this post by Haris
Hi Haris,

We also have this issue on a queue with high activity, in our case we do
more than 3 million messages a day (spread out over the day so never a
moment with 0 activity).
But this was never an issue before we updated..

See my previous post today, if we still encounter this issue again with the
minor downgrade we will return to the original versions where everything
worked fine (yes, don't fix/update anything if it ain't broken).



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Tim Bain
In reply to this post by Stefanic
Would you be able to use JVisualVM to perform CPU sampling while this is
occurring, to find out what those threads are doing? **WARNING** Do NOT do
CPU *profiling*, which would slow your broker significantly; sampling is
lightweight and generally held to be safe to do against an operational
process, but profiling is very heavyweight. Make sure you're on the right
tab in JVisualVM.

Tim

On Feb 2, 2018 12:05 AM, "Stefanic" <[hidden email]> wrote:

> Hi Tim,
>
> There are no selectors, so there are no messages left behind.
> We have seen this three times now in production and every time when the
> queue is empty (after about 30 minutes of refreshing) the problem goes
> away.
>
> Threads behavior is rather strange, when the blocking issue is on-going we
> see 1 thread constantly running (100%) in visualvm, and all other threads
> (31 in our case) almost seem synchronized on the same object because all of
> them start running at the same time and go into timed_waiting state at the
> same time.
> That results in 33% running of all other threads and that is not enough to
> keep our queue empty.
>
> Reproducing this behavior is nearly impossible for us, it occurs randomly
> within about a week.
>
> Here are the things we tried so far:
> - Downgraded ActiveMQ broker from 5.15.2 to our previous version 5.11.1 but
> the problem occurred again on that version so we assume it is not the
> broker/server
> - Yesterday we downgraded both ActiveMQ and Camel client libraries:
> ActiveMQ
> down to 5.14.5 and Camel from 2.20.1 down to 2.19.4 (I would have rather
> only downgraded 1 at a time but it's our production environment so cannot
> play around too much)
>
> I will reply here if we experience the blocked state again.
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Stefanic
I will try to remember doing so, are you interested in the CPU sampling
snapshot? And if so over what amount of time?

(I'm not very experienced with profiling/sampling, only really done heapdump
and threaddump analysis)


Tim Bain wrote

> Would you be able to use JVisualVM to perform CPU sampling while this is
> occurring, to find out what those threads are doing? **WARNING** Do NOT do
> CPU *profiling*, which would slow your broker significantly; sampling is
> lightweight and generally held to be safe to do against an operational
> process, but profiling is very heavyweight. Make sure you're on the right
> tab in JVisualVM.
>
> Tim
>
> On Feb 2, 2018 12:05 AM, "Stefanic" &lt;

> snicodem@

> &gt; wrote:
>
>> Hi Tim,
>>
>> There are no selectors, so there are no messages left behind.
>> We have seen this three times now in production and every time when the
>> queue is empty (after about 30 minutes of refreshing) the problem goes
>> away.
>>
>> Threads behavior is rather strange, when the blocking issue is on-going
>> we
>> see 1 thread constantly running (100%) in visualvm, and all other threads
>> (31 in our case) almost seem synchronized on the same object because all
>> of
>> them start running at the same time and go into timed_waiting state at
>> the
>> same time.
>> That results in 33% running of all other threads and that is not enough
>> to
>> keep our queue empty.
>>
>> Reproducing this behavior is nearly impossible for us, it occurs randomly
>> within about a week.
>>
>> Here are the things we tried so far:
>> - Downgraded ActiveMQ broker from 5.15.2 to our previous version 5.11.1
>> but
>> the problem occurred again on that version so we assume it is not the
>> broker/server
>> - Yesterday we downgraded both ActiveMQ and Camel client libraries:
>> ActiveMQ
>> down to 5.14.5 and Camel from 2.20.1 down to 2.19.4 (I would have rather
>> only downgraded 1 at a time but it's our production environment so cannot
>> play around too much)
>>
>> I will reply here if we experience the blocked state again.
>>
>>
>>
>> --
>> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
>> f2341805.html
>>





--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Tim Bain
If you're able to analyze the snapshot, that's great. If you want me to, I
can. If you're using any custom broker plugins, it's possible that your
package, class, and method names will appear in the stack trace information
captured by the snapshot, so if you're using custom plugins, make sure you
consider whether any of that is sensitive before sharing it publicly on
this list.

Ideally you'd only capture the snapshot while the problem is occurring; the
more time you capture from when everything is normal, the harder it will be
to zero in on the behavior from when things are bad. So I recommend
capturing for a while, then if the problem hasn't started occurring after
an hour or two, stop the sampling session and start a new one.

Tim

On Feb 2, 2018 7:57 AM, "Stefanic" <[hidden email]> wrote:

> I will try to remember doing so, are you interested in the CPU sampling
> snapshot? And if so over what amount of time?
>
> (I'm not very experienced with profiling/sampling, only really done
> heapdump
> and threaddump analysis)
>
>
> Tim Bain wrote
> > Would you be able to use JVisualVM to perform CPU sampling while this is
> > occurring, to find out what those threads are doing? **WARNING** Do NOT
> do
> > CPU *profiling*, which would slow your broker significantly; sampling is
> > lightweight and generally held to be safe to do against an operational
> > process, but profiling is very heavyweight. Make sure you're on the right
> > tab in JVisualVM.
> >
> > Tim
> >
> > On Feb 2, 2018 12:05 AM, "Stefanic" &lt;
>
> > snicodem@
>
> > &gt; wrote:
> >
> >> Hi Tim,
> >>
> >> There are no selectors, so there are no messages left behind.
> >> We have seen this three times now in production and every time when the
> >> queue is empty (after about 30 minutes of refreshing) the problem goes
> >> away.
> >>
> >> Threads behavior is rather strange, when the blocking issue is on-going
> >> we
> >> see 1 thread constantly running (100%) in visualvm, and all other
> threads
> >> (31 in our case) almost seem synchronized on the same object because all
> >> of
> >> them start running at the same time and go into timed_waiting state at
> >> the
> >> same time.
> >> That results in 33% running of all other threads and that is not enough
> >> to
> >> keep our queue empty.
> >>
> >> Reproducing this behavior is nearly impossible for us, it occurs
> randomly
> >> within about a week.
> >>
> >> Here are the things we tried so far:
> >> - Downgraded ActiveMQ broker from 5.15.2 to our previous version 5.11.1
> >> but
> >> the problem occurred again on that version so we assume it is not the
> >> broker/server
> >> - Yesterday we downgraded both ActiveMQ and Camel client libraries:
> >> ActiveMQ
> >> down to 5.14.5 and Camel from 2.20.1 down to 2.19.4 (I would have rather
> >> only downgraded 1 at a time but it's our production environment so
> cannot
> >> play around too much)
> >>
> >> I will reply here if we experience the blocked state again.
> >>
> >>
> >>
> >> --
> >> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> >> f2341805.html
> >>
>
>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Stefanic
Hi Tim,

It happened again yesterday so I made 2 CPU sampling snapshots.
They don't tell me much so if you'd like I can supply them both, contact me
at snicodem AT movingintelligence D com

Because it is our production environment we have reverted back to the
library versions of activemq and camel we did not see this issue with. The
versions are 5.12.2 and 2.15.6 respectively.

I did see some behavior in our code that needs further analysis/testing.
I will try and post again about any findings or when the downgrade does in
fact stop this issue from occurring.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Consumers hanging on a queue although there are messages in it

Stefanic
It has been a while so here's an update.
The same problem has been occurring on and off for the past two months now
and there is one suspect always coming back: message grouping.

We have found an tried several things and here are some of the findings:

*Message groups cache*
ActiveMQ defaults to an LRU cache with size 1024 for storing hashes of the
JMX message group header.
We where grouping on a higher number and could not find where to change this
setting so we went to 1024 message groups in code.
That did not help the 'hanging' problem at all.

*Broker page size*
Because the ActiveMQ broker sends all messages of a group to a single
consumer it needs to load messages in memory. When all messages in memory
are for a single consumer all other messages in the queue are not processed.
Max page size is the parameter to enable the broker to load more messages in
memory and hopefully will find messages for another consumer so flow is not
impacted heavily.

That problem with message groups and some kind of bug within client and/or
broker seems to trigger the hanging state.
When we simulate a lot of messages for a single broker, even within the max
page size, we encounter the hanging state issue (although lately another
variant, more below). Strangely after restart of the client and broker
failover the hanging state disappears so it must be something when running
for a while instead of a full queue when just started.

After changing the maxPageSize (increasing it from 1000 to 10000) we did see
a major decline in incidents, so that definitely has effect (and supporting
the theory above how that causes the hanging state).

The hanging state we encounter recently is a failover transport handler in
the client that seems to think the broker is down/unresponsive and blocks
all consumers for a specific timeout (3 seconds default I think). After that
timeout everything continues for a few seconds and the timeout is triggered
again in an endless loop.
Only way we know how to stop this is restarting the client and performing a
broker failover.

*Next steps*
We are now researching how number of consumers, maxPageSize and client
preFetch settings interact with each other to hopefully find a good setting
for all those parameters. Mostly because the number of consumers directly
affects the number of messages groups per consumer.

Also we upgraded to the latest activemq and camel client libraries and the
latest ActiveMQ broker.
The broker is running quite some time now and the issues continued, the
client libraries update will be released to production soon.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html