ActiveMQ stops delivering messages to consumers when saturating a high-latency network link

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

ActiveMQ stops delivering messages to consumers when saturating a high-latency network link

Tim Bain
I'm doing performance characterization of ActiveMQ when a network of
brokers runs across a high-latency (100ms range) WAN.  When my producer on
one side of the WAN sends faster than our meager allocation of the WAN's
bandwidth, I quickly see all messages fail to be delivered to the end
consumer.

These are the three critical elements of the problem, which all have to be
present for it to happen:
1.  Messages have a TTL set (the same for all messages), so they'll
eventually expire.  We're using Camel to do this for us, but it would be
the same if it were set directly without Camel's help.
2.  Producers are sending messages faster (in aggregate) than our bandwidth
allocation on the WAN.  This means we're guaranteed to not deliver some of
the messages to the end consumer, but in practice we're not delivering any
of them.
3.  There is a non-trivial amount of latency across the WAN.

As messages are sent, they begin queuing on the sender-side broker.  As
time goes on, the messages that are still in the producer-side broker's
message store get closer and closer to expiring, until eventually the
message at the head of the message store is within the WAN's latency value
(e.g. 100ms) of the message's expiration time.  The amount of time it takes
for this to happen depends on how long it takes messages to time out and on
the difference between the producer's send rate and the WAN's bandwidth,
but it will eventually happen.  This message will be sent by the
producer-side broker (because although it's really close to expiring, it
hasn't expired yet), but when it's received by the consumer-side broker, an
amount of time equal to the WAN latency has passed, so it's expired and
gets discarded by the consumer-side broker instead of getting delivered to
the consumer.

From this point onwards, no messages will get successfully delivered to the
consumer.  As the messages in the producer-side broker's message store get
closer to and eventually reach their expiration times, each message will
either be within the WAN latency of its timeout or after its timeout.  If
the former, it will get sent across the WAN but discarded by the
consumer-side broker; if the latter, it will get discarded by the
producer-side broker and that broker will find the next message in the
message store that isn't yet expired (but will be by the time it arrives)
and send it instead.  As a result, all messages from that point onward
either expire on the producer-side broker or the consumer-side broker.
Even though there are lots of messages in the producer-side broker's
message store that could be delivered successfully, ActiveMQ instead sends
the first message in the message store even though an outside observer
knows it will just get thrown away.

Ideally, ActiveMQ should prioritize messages that are expected to reach an
end consumer over ones that are expected to time out before they get there,
to minimize wasteful use of scarce resources such as network links.  Doing
that automatically and without any the user having to provide lots of
up-front configuration of network topology sounds hard, particularly when
considering that network link performance can vary over time and that
different consumers may have different network paths from the producer to
the consumer.  But I think it would be very useful to have a setting that
allows a user to specify that messages within X milliseconds of their
expiration time will be discarded by the broker rather than forwarded to
the next broker.  The default should be 0 (so all messages that haven't
actually expired would be forwarded), but if I know that my network path
has a certain latency, I should be able to configure the broker to not even
try delivering messages that I know aren't likely to make it to an end
consumer, so that messages that will can be sent instead.

Does this seem like a reasonable feature to add?  If so, I'll submit a JIRA
for it.

Tim
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ stops delivering messages to consumers when saturating a high-latency network link

Tim Bain
I've submitted https://issues.apache.org/jira/browse/AMQ-5370 for this
feature request.

On Mon, Sep 15, 2014 at 11:16 AM, Tim Bain <[hidden email]> wrote:

> I'm doing performance characterization of ActiveMQ when a network of
> brokers runs across a high-latency (100ms range) WAN.  When my producer on
> one side of the WAN sends faster than our meager allocation of the WAN's
> bandwidth, I quickly see all messages fail to be delivered to the end
> consumer.
>
> These are the three critical elements of the problem, which all have to be
> present for it to happen:
> 1.  Messages have a TTL set (the same for all messages), so they'll
> eventually expire.  We're using Camel to do this for us, but it would be
> the same if it were set directly without Camel's help.
> 2.  Producers are sending messages faster (in aggregate) than our
> bandwidth allocation on the WAN.  This means we're guaranteed to not
> deliver some of the messages to the end consumer, but in practice we're not
> delivering any of them.
> 3.  There is a non-trivial amount of latency across the WAN.
>
> As messages are sent, they begin queuing on the sender-side broker.  As
> time goes on, the messages that are still in the producer-side broker's
> message store get closer and closer to expiring, until eventually the
> message at the head of the message store is within the WAN's latency value
> (e.g. 100ms) of the message's expiration time.  The amount of time it takes
> for this to happen depends on how long it takes messages to time out and on
> the difference between the producer's send rate and the WAN's bandwidth,
> but it will eventually happen.  This message will be sent by the
> producer-side broker (because although it's really close to expiring, it
> hasn't expired yet), but when it's received by the consumer-side broker, an
> amount of time equal to the WAN latency has passed, so it's expired and
> gets discarded by the consumer-side broker instead of getting delivered to
> the consumer.
>
> From this point onwards, no messages will get successfully delivered to
> the consumer.  As the messages in the producer-side broker's message store
> get closer to and eventually reach their expiration times, each message
> will either be within the WAN latency of its timeout or after its timeout.
> If the former, it will get sent across the WAN but discarded by the
> consumer-side broker; if the latter, it will get discarded by the
> producer-side broker and that broker will find the next message in the
> message store that isn't yet expired (but will be by the time it arrives)
> and send it instead.  As a result, all messages from that point onward
> either expire on the producer-side broker or the consumer-side broker.
> Even though there are lots of messages in the producer-side broker's
> message store that could be delivered successfully, ActiveMQ instead sends
> the first message in the message store even though an outside observer
> knows it will just get thrown away.
>
> Ideally, ActiveMQ should prioritize messages that are expected to reach an
> end consumer over ones that are expected to time out before they get there,
> to minimize wasteful use of scarce resources such as network links.  Doing
> that automatically and without any the user having to provide lots of
> up-front configuration of network topology sounds hard, particularly when
> considering that network link performance can vary over time and that
> different consumers may have different network paths from the producer to
> the consumer.  But I think it would be very useful to have a setting that
> allows a user to specify that messages within X milliseconds of their
> expiration time will be discarded by the broker rather than forwarded to
> the next broker.  The default should be 0 (so all messages that haven't
> actually expired would be forwarded), but if I know that my network path
> has a certain latency, I should be able to configure the broker to not even
> try delivering messages that I know aren't likely to make it to an end
> consumer, so that messages that will can be sent instead.
>
> Does this seem like a reasonable feature to add?  If so, I'll submit a
> JIRA for it.
>
> Tim
>
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ stops delivering messages to consumers when saturating a high-latency network link

artnaseef
Honestly, that doesn't sound reasonable to me.

ActiveMQ has no solid way to know which messages *should* or *should not* reach their destination before the TTL expires.  That would require some feedback from across the broker-to-broker connection so the sending broker could take that into account.  In addition, JMS has ordering guarantees (although it's never wise to fully depend on them).

The problem set sounds interesting - almost like ActiveMQ is being used as a throttle for network traffic.

Anyway, how about altering the message TTLs plus or minus the maximum expected network latency times two?  Either using randomness or simply alternating between + and -.  In that way, some messages will still survive the trip while others don't when the backlog of messages reaches that critical point.  And, some (most?) should expire before hitting the wire, saving that wasted trip.
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ stops delivering messages to consumers when saturating a high-latency network link

Tim Bain
Are you saying that the manual solution (a configurable threshold on a
broker's network connection to another broker) doesn't sound reasonable, or
just that the automatic solution (where the broker would measure end-to-end
latency all the way to the consumer) doesn't?  I agree that the automatic
solution is a) hard, and b) not in line with ActiveMQ's current approach of
brokers being agnostic of the full route to a consumer (though an exception
is made for decreaseNetworkConsumerPriority) so I don't expect it to be
implemented anytime soon.

But I think that the manual configuration option would provide a reasonable
but crude tool for dealing with high-latency networks, and it wouldn't
violate the JMS ordering guarantees since messages would be discarded or
not but never delivered out of order.

The problem set is pretty straight-forward: we're given a pretty limited
bandwidth allocation across a WAN, and although we've tried to estimate our
bandwidth usage to make sure we're going to stay within it, fluctuations in
usage patterns might push us beyond our allocation.  If that happens, we'd
like to continue processing in a degraded manner - some messages would be
lost and all messages would be processed just before their message
expiration time - rather than stop processing entirely, which is what
happens now.  So yes, ActiveMQ would be used here to throttle traffic
instead of allowing the network link to do so, to provide better end-to-end
functionality across the network of brokers as a whole.  And although
there's an obvious need for all ActiveMQ users to allocate adequate
bandwidth for their needs, having this option in ActiveMQ would provide a
safeguard for what happens if either the actual ActiveMQ usage or the
available bandwidth changes unexpectedly.

Your suggestion of adding jitter to the message expiration time is an
interesting one that I hadn't considered.  Since we're using Camel to send
our messages and to set that expiration time, it would be non-trivial to
implement, but it might give us a way to handle this if ActiveMQ doesn't.
But I think having a single setting in the config of each ActiveMQ
broker would be preferable to having to implement a way to use a jittery
expiration date and then configure that jitter on every producer we
configure, since it would reduce the odds of someone misconfiguring a
producer and not getting protected.

Tim

On Sun, Sep 28, 2014 at 12:25 AM, artnaseef <[hidden email]> wrote:

> Honestly, that doesn't sound reasonable to me.
>
> ActiveMQ has no solid way to know which messages *should* or *should not*
> reach their destination before the TTL expires.  That would require some
> feedback from across the broker-to-broker connection so the sending broker
> could take that into account.  In addition, JMS has ordering guarantees
> (although it's never wise to fully depend on them).
>
> The problem set sounds interesting - almost like ActiveMQ is being used as
> a
> throttle for network traffic.
>
> Anyway, how about altering the message TTLs plus or minus the maximum
> expected network latency times two?  Either using randomness or simply
> alternating between + and -.  In that way, some messages will still survive
> the trip while others don't when the backlog of messages reaches that
> critical point.  And, some (most?) should expire before hitting the wire,
> saving that wasted trip.
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-stops-delivering-messages-to-consumers-when-saturating-a-high-latency-network-link-tp4685557p4685951.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>