High CPU load with network connector, failover transport

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

High CPU load with network connector, failover transport

Tim Robbins
Hi,

We’ve noticed a regression in ActiveMQ 5.10.1 vs. 5.10.0 with a configuration similar to the following:

Broker 1:
networkConnector with static:(failover:(tcp://broker2 <tcp://broker2>)?randomize=false&maxReconnectAttempts=0)

Broker 2:
networkConnector with static:(failover:(tcp://broker1 <tcp://broker1>)?randomize=false&maxReconnectAttempts=0)

When one of the brokers is restarted, the other broker uses ~400% CPU. The cause is the FailoverTransport reconnectTask spinning, and nothing is stopping the task.

Reverting this fix made for AMQ-5315, while it does reintroduce the NullPointerException, does handle failover properly without spinning:
https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f <https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f>

The reason it works after reverting that change is the NullPointerException is caught, -> serviceLocalException() -> ServiceSupport.dispose(getControllingService()); with the fix made in AMQ-5315, the dispose() call is never made.

I think, rather than reverting the AMQ-5315 commit, it would be fine to just call dispose() before fireBridgeFailed() in the case where we can’t retrieve the broker info

This does seem like a fairly serious problem; as far as I’m aware this is a common use case; anyone using the masterslave transport or the failover transport w/ the required maxReconnectAttempts=0 for bridges would be exposed to it for example.

Regards,

Tim

Reply | Threaded
Open this post in threaded view
|

Re: High CPU load with network connector, failover transport

Tim Robbins
By the way, I've noticed Lars has run into the same issue and posted via Nabble but it hasn't turned up on the mailing list yet:

http://activemq.2283324.n4.nabble.com/Using-a-NetworkConnector-results-in-high-CPU-load-td4691627.html


> On 20 Feb 2015, at 12:16 pm, Tim Robbins <[hidden email]> wrote:
>
> Hi,
>
> We’ve noticed a regression in ActiveMQ 5.10.1 vs. 5.10.0 with a configuration similar to the following:
>
> Broker 1:
> networkConnector with static:(failover:(tcp://broker2 <tcp://broker2>)?randomize=false&maxReconnectAttempts=0)
>
> Broker 2:
> networkConnector with static:(failover:(tcp://broker1 <tcp://broker1>)?randomize=false&maxReconnectAttempts=0)
>
> When one of the brokers is restarted, the other broker uses ~400% CPU. The cause is the FailoverTransport reconnectTask spinning, and nothing is stopping the task.
>
> Reverting this fix made for AMQ-5315, while it does reintroduce the NullPointerException, does handle failover properly without spinning:
> https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f <https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f>
>
> The reason it works after reverting that change is the NullPointerException is caught, -> serviceLocalException() -> ServiceSupport.dispose(getControllingService()); with the fix made in AMQ-5315, the dispose() call is never made.
>
> I think, rather than reverting the AMQ-5315 commit, it would be fine to just call dispose() before fireBridgeFailed() in the case where we can’t retrieve the broker info
>
> This does seem like a fairly serious problem; as far as I’m aware this is a common use case; anyone using the masterslave transport or the failover transport w/ the required maxReconnectAttempts=0 for bridges would be exposed to it for example.
>
> Regards,
>
> Tim
>
Reply | Threaded
Open this post in threaded view
|

Re: High CPU load with network connector, failover transport

Tim Bain
Pleas submit a bug in JIRA for it, ideally with a unit test that shows the
problem (though I'm not quite sure how you'd write a unit test to confirm
that a particular thread isn't spinning a core so that might be wishful
thinking).
On Feb 20, 2015 6:59 AM, "Tim Robbins" <[hidden email]> wrote:

> By the way, I've noticed Lars has run into the same issue and posted via
> Nabble but it hasn't turned up on the mailing list yet:
>
>
> http://activemq.2283324.n4.nabble.com/Using-a-NetworkConnector-results-in-high-CPU-load-td4691627.html
>
>
> > On 20 Feb 2015, at 12:16 pm, Tim Robbins <[hidden email]>
> wrote:
> >
> > Hi,
> >
> > We’ve noticed a regression in ActiveMQ 5.10.1 vs. 5.10.0 with a
> configuration similar to the following:
> >
> > Broker 1:
> > networkConnector with static:(failover:(tcp://broker2
> <tcp://broker2>)?randomize=false&maxReconnectAttempts=0)
> >
> > Broker 2:
> > networkConnector with static:(failover:(tcp://broker1
> <tcp://broker1>)?randomize=false&maxReconnectAttempts=0)
> >
> > When one of the brokers is restarted, the other broker uses ~400% CPU.
> The cause is the FailoverTransport reconnectTask spinning, and nothing is
> stopping the task.
> >
> > Reverting this fix made for AMQ-5315, while it does reintroduce the
> NullPointerException, does handle failover properly without spinning:
> >
> https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f
> <
> https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f
> >
> >
> > The reason it works after reverting that change is the
> NullPointerException is caught, -> serviceLocalException() ->
> ServiceSupport.dispose(getControllingService()); with the fix made in
> AMQ-5315, the dispose() call is never made.
> >
> > I think, rather than reverting the AMQ-5315 commit, it would be fine to
> just call dispose() before fireBridgeFailed() in the case where we can’t
> retrieve the broker info
> >
> > This does seem like a fairly serious problem; as far as I’m aware this
> is a common use case; anyone using the masterslave transport or the
> failover transport w/ the required maxReconnectAttempts=0 for bridges would
> be exposed to it for example.
> >
> > Regards,
> >
> > Tim
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: High CPU load with network connector, failover transport

goggles123