I'm using an AbortSlowConsumerStrategy with AMQ 5.8.0, with a single
producer (P) and a single intentionally-slow consumer (C), and two brokers
(B1 and B2) between them. C connects to B2 using the failover transport
(but only one broker, B2, in Cluster 2 at the moment), so it will attempt
to reconnect when it gets aborted.
But sometimes this works and sometimes it fails, and I'm not having any
luck figuring out why, so I'm hoping that Tim or Garry (or anyone else with
a deep knowledge of slow consumer strategies) will have some insight about
what to look at.
When I have abortConnection="false" on the AbortSlowConsumerStrategy,
everything works as I would expect: the consumer is aborted, B2
unsubscribes from B1, and a few seconds later C re-subscribes to B2, which
re-subscribes to B1. I can see the subscriptions in the JMX views in
When I have abortConnection="true" and no debugger attached, the consumer
is aborted, B2 unsubscribes from B1, and almost immediately C re-subscribes
to B2, which attempts to re-subscribe to B1. But the B2->B1
re-subscription appears to not work: JConsole does not show a
broker-to-broker subscription, and no messages flow to B2 and C.
When I attach a debugger and set breakpoints in places like
Topic.addSubsciption() and AdvisoryBroker.addConsumer() and step through
the code, I can't see any issues, and most of the time the re-subscription
actually succeeds. But it fails, even with breakpoints set, which seems to
indicate a race condition of some sort, possibly one with a time element
(where the other thread completes before thread I'm breaking when there's a
breakpoint to waste time, but where they finish in the other order if
there's no breakpoint).
I've got two questions:
1. Most immediately, since the behavior seems to be what I'd want with
abortConnection="false", what do I lose if I use that setting? Tim Bish's
http://timbish.blogspot.com/2013/07/coming-in-activemq-59-new-way-to-abort.html is the best description I've seen of what it does, but it doesn't really
say what you lose if you don't use it. What's the risk if we don't use it,
or what use case was it included to handle?
2. Even if setting abortConnection="false" gets around the problem,
there seems to be a bug somewhere here, and I'd like to help find it to
make slow consumer handling more robust for other users. I've not found
anything that might explain the race condition in the places I've looked so
far; Topic.consumers, AdvisoryBroker.consumers, and
SessionState.consumers all seem to behave as expected both when
re-subscription succeeds and when it fails, and so does
TransportConnector.consumerExchanges (though it looks like there's a minor,
possibly-irrelevant race condition there, since we're not holding the lock
when we check whether an exchange already exists for that ID). Is there
somewhere else I should look for the cause of the bad behavior I'm seeing?
Can anyone suggest any methods on other threads that I should be setting a
breakpoint on, besides the ones on the method-processing thread I've
already stepped through?