Messages are stuck in ActiveMQ 5.11 and delivered for after more than 24 hours

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Messages are stuck in ActiveMQ 5.11 and delivered for after more than 24 hours

sravan
Our batch processing applications process abut 10 Billion messages a day. From past two months we have been experiencing an issue with ActiveMQ where  ActiveMQ delivers messages very late, sometimes messages are delivered 3 days later. Daily, in the worst case 5% (and mostly 1%) of messages are delivered one day later. We do not have message expiration policy by the way that controls what should happen when a message is not delivered for a certain period of time. We monitored our splunk based logs and do not see any exceptions or errors that indicate any issues with consumer connections. We could not turn on additional logging on ActiveMQ as that will cause a huge hit on performance. So we mostly relied on monitoring ActiveMQ consoles and Dynatarce. There is no server resource utilization issues on AMQs. By the way we have 4 AMQ nodes active in the cluster. When we monitored activeMQ consoles we saw messages stuck in network bridge sometimes. When I say stuck, what I mean is that message draining was extremely slow and in a period of 2 hours I noticed only handful of messages getting drained. Whenever we restart a AMQ node any stuck messages on that node are getting released (I think this is a known fact to all AMQ users).  What is most frustrating is, number of stuck messages we have noticed do not correlate to number of messages delivered lately (i.e. delivered next day). So we are under an impression that there could be invisible stuck messages. BTW this issue started happening ever since we applied a Linux patch (redhat-release-5Server-5.11.0.9 /          
   autofs-5.0.1-0.rc2.186.el5_11) on ActiveMQ nodes. We did some research in the forums to check if there is any incompatibility between AMQ 5.11 and this Linux patch but could not find anything. Does anyone here have any ideas suggestions how we can troubleshoot this issue further? Any inputs would be greatly appreciated.  
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Messages are stuck in ActiveMQ 5.11 and delivered for after more than 24 hours

Tim Bain
The most effective way I know to determine what a Java process is doing
when I can't step through with a debugger is to use a CPU sampler
(JVisualVM ships with the Oracle JDK and can attach either locally or from
a remote machine via JMX and RMI) to capture where the time is being spent.
Let it capture data for a few minutes, then take a snapshot and dig into
what the various threads are spending their time on.

Don't use the profiler on an operational system! The sampler is what you
want, since it can give you good insight without measurably degrading
performance, whereas the profiler will grind everything to a halt on a
system as heavily loaded as yours sounds.

Tim

On Jun 21, 2017 1:08 PM, "sravan" <[hidden email]> wrote:

> Our batch processing applications process abut 10 Billion messages a day.
> From past two months we have been experiencing an issue with ActiveMQ where
> ActiveMQ delivers messages very late, sometimes messages are delivered 3
> days later. Daily, in the worst case 5% (and mostly 1%) of messages are
> delivered one day later. We do not have message expiration policy by the
> way
> that controls what should happen when a message is not delivered for a
> certain period of time. We monitored our splunk based logs and do not see
> any exceptions or errors that indicate any issues with consumer
> connections.
> We could not turn on additional logging on ActiveMQ as that will cause a
> huge hit on performance. So we mostly relied on monitoring ActiveMQ
> consoles
> and Dynatarce. There is no server resource utilization issues on AMQs. By
> the way we have 4 AMQ nodes active in the cluster. When we monitored
> activeMQ consoles we saw messages stuck in network bridge sometimes. When I
> say stuck, what I mean is that message draining was extremely slow and in a
> period of 2 hours I noticed only handful of messages getting drained.
> Whenever we restart a AMQ node any stuck messages on that node are getting
> released (I think this is a known fact to all AMQ users).  What is most
> frustrating is, number of stuck messages we have noticed do not correlate
> to
> number of messages delivered lately (i.e. delivered next day). So we are
> under an impression that there could be invisible stuck messages. BTW this
> issue started happening ever since we applied a Linux patch
> (redhat-release-5Server-5.11.0.9 /
>    autofs-5.0.1-0.rc2.186.el5_11) on ActiveMQ nodes. We did some research
> in
> the forums to check if there is any incompatibility between AMQ 5.11 and
> this Linux patch but could not find anything. Does anyone here have any
> ideas suggestions how we can troubleshoot this issue further? Any inputs
> would be greatly appreciated.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Messages-are-stuck-in-ActiveMQ-5-11-and-
> delivered-for-after-more-than-24-hours-tp4727694.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Messages are stuck in ActiveMQ 5.11 and delivered for after more than 24 hours

sravan
Unfortunately we are in a predicament where we have issues in reproducing the problem in performance test environments as well as debugging. ActiveMQ INFO logs does not contain much information to understand what's going on. When we turn on DEBUG logging (in lower env) ActiveMQ is hanging just in few minutes and never comes back. So far the only one clue we are relying on is, while monitoring ActiveMQ consoles in our prod environment, we observed messages stuck on network bridge for multiple hours. Also in our research, we know that consumers are doing fine and the main issue is with ActiveMQ which is delivering messages very late and we simply don't know the root cause. When we worked with an ActiveMQ consultant a year ago, we were advised to scale ActiveMQs  vertically  rather than scaling them horizontally. i.e. Have 2 larger AMQ nodes rather than having 4 smaller nodes in a cluster. We were told that this avoids many potential anomalies with network brdige and shipping messages across the nodes etc...Does anyone here have any suggestions with respect to how else we could debug or fix our issue. Again just to refresh your minds, the main issue we’ve been struggling with is...Our AMQs are delivering messages extremely late, sometimes a day or even two days later. We do not see any JMS exceptions, and no exceptions or issues at consumer's end. Any inputs are greatly appreciated.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Messages are stuck in ActiveMQ 5.11 and delivered for after more than 24 hours

Tim Bain
I repeat my earlier suggestion that you should use a sampler on all brokers
to characterize where the time is being spent.

The fact that turning logging to debug causes the broker to experience a
slowdown almost immediately makes it seem like maybe there's an issue with
disk I/O or space, but that's just a guess and using a sampler will give
you something more concrete than a guess.

Also, you've characterized the problem as messages getting "stuck" in the
network connectors. When this happens, are individual messages truly stuck
(i.e. no messages are being passed), or is it simply that the rate they're
flowing out at os lower than the rate they're flowing in at (so there's a
net backup but individual messages are still being passed)? And when
messages are passed, do they arrive in order, or do they show up vastly
different from the order in which they were sent?

Tim

On Jul 11, 2017 4:29 PM, "sravan" <[hidden email]> wrote:

> Unfortunately we are in a predicament where we have issues in reproducing
> the
> problem in performance test environments as well as debugging. ActiveMQ
> INFO
> logs does not contain much information to understand what's going on. When
> we turn on DEBUG logging (in lower env) ActiveMQ is hanging just in few
> minutes and never comes back. So far the only one clue we are relying on
> is,
> while monitoring ActiveMQ consoles in our prod environment, we observed
> messages stuck on network bridge for multiple hours. Also in our research,
> we know that consumers are doing fine and the main issue is with ActiveMQ
> which is delivering messages very late and we simply don't know the root
> cause. When we worked with an ActiveMQ consultant a year ago, we were
> advised to scale ActiveMQs  vertically  rather than scaling them
> horizontally. i.e. Have 2 larger AMQ nodes rather than having 4 smaller
> nodes in a cluster. We were told that this avoids many potential anomalies
> with network brdige and shipping messages across the nodes etc...Does
> anyone
> here have any suggestions with respect to how else we could debug or fix
> our
> issue. Again just to refresh your minds, the main issue we’ve been
> struggling with is...Our AMQs are delivering messages extremely late,
> sometimes a day or even two days later. We do not see any JMS exceptions,
> and no exceptions or issues at consumer's end. Any inputs are greatly
> appreciated.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Messages-are-stuck-in-ActiveMQ-5-11-and-
> delivered-for-after-more-than-24-hours-tp4727694p4728468.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Loading...