Unfair work balancing across hosts in our cluster.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Unfair work balancing across hosts in our cluster.

Kevin Burton
We have a problem where all work is given to ONE host in our cluster. What
then happens is that this box goes to 100% CPU and other boxes are idle and
need more work.

We have an activemq setup where we create 16 connections to ActiveMQ (one
per core), and then one session per thread with a prefetch size > 0...
Right now it's set to 1 but it has been higher in the past.

We run about 200 threads per box so that's 200 sessions across all 16
connections.

So I'm pretty sure what's happening is that all the messages are getting
read into the prefetch on ONE host and then no other work is is available.
This host just then SITS on this work choking out other consumers on other
hosts.

Is there a way to flatten this out to slow specific hosts or rate limit
total messages given out per host?

This would help spread work throughout the cluster.

Kevin

--

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
Reply | Threaded
Open this post in threaded view
|

Re: Unfair work balancing across hosts in our cluster.

Tim Bain
Kevin, welcome back.

What's your network of brokers topology, and your client connection URL?
Could it be that all your clients are on that one broker (which would
explain heavy load), or are you sure your clients are evenly distributed?

Also, have you set decreaseNetworkConsumerPriority to true?

When you look at the backed-up consumers in question via JMX, how many
messages are currently dispatched to the consumer, and how large is their
prefetch buffer?

Tim

On Thu, Apr 28, 2016 at 11:36 AM, Kevin Burton <[hidden email]> wrote:

> We have a problem where all work is given to ONE host in our cluster. What
> then happens is that this box goes to 100% CPU and other boxes are idle and
> need more work.
>
> We have an activemq setup where we create 16 connections to ActiveMQ (one
> per core), and then one session per thread with a prefetch size > 0...
> Right now it's set to 1 but it has been higher in the past.
>
> We run about 200 threads per box so that's 200 sessions across all 16
> connections.
>
> So I'm pretty sure what's happening is that all the messages are getting
> read into the prefetch on ONE host and then no other work is is available.
> This host just then SITS on this work choking out other consumers on other
> hosts.
>
> Is there a way to flatten this out to slow specific hosts or rate limit
> total messages given out per host?
>
> This would help spread work throughout the cluster.
>
> Kevin
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>