Amq 5.15.2 slow queues with JMSXGroupID

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Amq 5.15.2 slow queues with JMSXGroupID

n.yovchev
Hello,

We have been using AMQ in production for quite a while some time already,
and we are noticing a strange behavior on one of our queues.

The situation is as follows:

- we do clickstream traffic so when we have identified a user, all his
events are "grouped" by JMSXGroupID property (which is an UUID, in our case,
we can have millions of these per hour) so we have some order in consuming
the events for the same user in case they do burst
- we use KahaDB with kinda the following config:

<mKahaDB directory="${activemq.data}/mkahadb">
    <filteredPersistenceAdapters>
        <filteredKahaDB perDestination="true">
            <persistenceAdapter>
                <kahaDB checkForCorruptJournalFiles="true"
journalDiskSyncStrategy="PERIODIC" journalDiskSyncInterval="5000"
preallocationStrategy="zeros" concurrentStoreAndDispatchQueues="false" />
            </persistenceAdapter>
        </filteredKahaDB>
    </filteredPersistenceAdapters>
</mKahaDB>

- the broker is in a rather beefy EC2 instance, but it doesn't seem to hit
any limits, neither file limits, nor IOPS, nor CPU limits
- destination policy for this destination uses, very similar to a lot other
destinations that use the same grouping for JMSXGroupID:

<policyEntry queue="suchDestination>" producerFlowControl="false"
memoryLimit="256mb" maxPageSize="5000" maxBrowsePageSize="2000">
    <messageGroupMapFactory>
        <simpleMessageGroupMapFactory/>
    </messageGroupMapFactory>
    <deadLetterStrategy>
        <individualDeadLetterStrategy queuePrefix="DLQ."
useQueueForQueueMessages="true" />
    </deadLetterStrategy>
</policyEntry>

- consumers consume messages fairly slowly compared to other destinations
(about 50-100ms per message compared to
other consumers for other destinations- about 10-30ms per message)

- however, it seems we end up in a situation, where the consumers are not
consuming with the speed we expect them to be doing, and seem to wait for
something, while there is a huge load of messages on the remote broker for
that destination. The consumers seem to also not be neither CPU, nor IO
bound, nor network traffic bound.

- a symptom is that if we split that queue to two queues and we attach the
same number of consumers in the same number of nodes to consume it, things
are somehow becoming better. Also, if there is a huge workload for that
queue, if we just rename it to suchQueue2 on producers, and assign some
consumers on it, these consumers are much faster (for a while) than the
consumers on the "old" suchQueue.

- the queue doesn't have "non-grouped messages", all messages on it have the
JMSXGroupID property and are of the same type.

- increasing the number of consumers or lowering it for that queue seems to
have little effect

- rebooting the consumer apps seems to have little effect once the queue
becomes "slow to consume"


Has anybody experienced this:

in short:

Broker is waiting a considerable time for the consumers who seem to be free
and not busy all the time.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Amq 5.15.2 slow queues with JMSXGroupID

paulgale
A few questions:

- what prefetch size are your consumers using? Ideally it should be set to
1 to prevent out of order messages.
- what acknowledgement mode are you using?
- as each queue supports a max of 1024 groups, by default, are you sure
you're not blowing past this number? That might cause the broker to spend
most its time repeatedly re-assigning groups to consumers.

You did say something about millions of UUID value being seen per hour.
- are you hashing those UUID values down so that multiple UUIDs map to the
same group ID?
- what happens when you force the broker to close all message groups so
that it performs a re-assignment of all known groups to consumers? This
operation is available through JMX.




Thanks,
Paul

On Thu, Nov 9, 2017 at 5:28 AM, n.yovchev <[hidden email]> wrote:

> Hello,
>
> We have been using AMQ in production for quite a while some time already,
> and we are noticing a strange behavior on one of our queues.
>
> The situation is as follows:
>
> - we do clickstream traffic so when we have identified a user, all his
> events are "grouped" by JMSXGroupID property (which is an UUID, in our
> case,
> we can have millions of these per hour) so we have some order in consuming
> the events for the same user in case they do burst
> - we use KahaDB with kinda the following config:
>
> <mKahaDB directory="${activemq.data}/mkahadb">
>     <filteredPersistenceAdapters>
>         <filteredKahaDB perDestination="true">
>             <persistenceAdapter>
>                 <kahaDB checkForCorruptJournalFiles="true"
> journalDiskSyncStrategy="PERIODIC" journalDiskSyncInterval="5000"
> preallocationStrategy="zeros" concurrentStoreAndDispatchQueues="false" />
>             </persistenceAdapter>
>         </filteredKahaDB>
>     </filteredPersistenceAdapters>
> </mKahaDB>
>
> - the broker is in a rather beefy EC2 instance, but it doesn't seem to hit
> any limits, neither file limits, nor IOPS, nor CPU limits
> - destination policy for this destination uses, very similar to a lot other
> destinations that use the same grouping for JMSXGroupID:
>
> <policyEntry queue="suchDestination>" producerFlowControl="false"
> memoryLimit="256mb" maxPageSize="5000" maxBrowsePageSize="2000">
>     <messageGroupMapFactory>
>         <simpleMessageGroupMapFactory/>
>     </messageGroupMapFactory>
>     <deadLetterStrategy>
>         <individualDeadLetterStrategy queuePrefix="DLQ."
> useQueueForQueueMessages="true" />
>     </deadLetterStrategy>
> </policyEntry>
>
> - consumers consume messages fairly slowly compared to other destinations
> (about 50-100ms per message compared to
> other consumers for other destinations- about 10-30ms per message)
>
> - however, it seems we end up in a situation, where the consumers are not
> consuming with the speed we expect them to be doing, and seem to wait for
> something, while there is a huge load of messages on the remote broker for
> that destination. The consumers seem to also not be neither CPU, nor IO
> bound, nor network traffic bound.
>
> - a symptom is that if we split that queue to two queues and we attach the
> same number of consumers in the same number of nodes to consume it, things
> are somehow becoming better. Also, if there is a huge workload for that
> queue, if we just rename it to suchQueue2 on producers, and assign some
> consumers on it, these consumers are much faster (for a while) than the
> consumers on the "old" suchQueue.
>
> - the queue doesn't have "non-grouped messages", all messages on it have
> the
> JMSXGroupID property and are of the same type.
>
> - increasing the number of consumers or lowering it for that queue seems to
> have little effect
>
> - rebooting the consumer apps seems to have little effect once the queue
> becomes "slow to consume"
>
>
> Has anybody experienced this:
>
> in short:
>
> Broker is waiting a considerable time for the consumers who seem to be free
> and not busy all the time.
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Amq 5.15.2 slow queues with JMSXGroupID

Tim Bain
In addition to Paul's good suggestions, I'd focus on determining whether
the problem is that the broker isn't dispatching messages fast enough or
that the consumers aren't consuming them fast enough. I'd guess from what
you described that it's the former, but I think it's worth being sure
(unless of course one of Paul's suggestions solves your problem outright).

The easiest way to check is to set your consumers' prefetch size to
something large (like 100, it doesn't need to be huge) and then use a JMX
viewer such as JConsole to see whether the DispatchedCount values are low
(0-10) or high (90-100). Low values mean that the broker is the bottleneck,
and high values mean that the consumers are.

Tim

On Nov 9, 2017 3:00 PM, "Paul Gale" <[hidden email]> wrote:

> A few questions:
>
> - what prefetch size are your consumers using? Ideally it should be set to
> 1 to prevent out of order messages.
> - what acknowledgement mode are you using?
> - as each queue supports a max of 1024 groups, by default, are you sure
> you're not blowing past this number? That might cause the broker to spend
> most its time repeatedly re-assigning groups to consumers.
>
> You did say something about millions of UUID value being seen per hour.
> - are you hashing those UUID values down so that multiple UUIDs map to the
> same group ID?
> - what happens when you force the broker to close all message groups so
> that it performs a re-assignment of all known groups to consumers? This
> operation is available through JMX.
>
>
>
>
> Thanks,
> Paul
>
> On Thu, Nov 9, 2017 at 5:28 AM, n.yovchev <[hidden email]> wrote:
>
> > Hello,
> >
> > We have been using AMQ in production for quite a while some time already,
> > and we are noticing a strange behavior on one of our queues.
> >
> > The situation is as follows:
> >
> > - we do clickstream traffic so when we have identified a user, all his
> > events are "grouped" by JMSXGroupID property (which is an UUID, in our
> > case,
> > we can have millions of these per hour) so we have some order in
> consuming
> > the events for the same user in case they do burst
> > - we use KahaDB with kinda the following config:
> >
> > <mKahaDB directory="${activemq.data}/mkahadb">
> >     <filteredPersistenceAdapters>
> >         <filteredKahaDB perDestination="true">
> >             <persistenceAdapter>
> >                 <kahaDB checkForCorruptJournalFiles="true"
> > journalDiskSyncStrategy="PERIODIC" journalDiskSyncInterval="5000"
> > preallocationStrategy="zeros" concurrentStoreAndDispatchQueues="false"
> />
> >             </persistenceAdapter>
> >         </filteredKahaDB>
> >     </filteredPersistenceAdapters>
> > </mKahaDB>
> >
> > - the broker is in a rather beefy EC2 instance, but it doesn't seem to
> hit
> > any limits, neither file limits, nor IOPS, nor CPU limits
> > - destination policy for this destination uses, very similar to a lot
> other
> > destinations that use the same grouping for JMSXGroupID:
> >
> > <policyEntry queue="suchDestination>" producerFlowControl="false"
> > memoryLimit="256mb" maxPageSize="5000" maxBrowsePageSize="2000">
> >     <messageGroupMapFactory>
> >         <simpleMessageGroupMapFactory/>
> >     </messageGroupMapFactory>
> >     <deadLetterStrategy>
> >         <individualDeadLetterStrategy queuePrefix="DLQ."
> > useQueueForQueueMessages="true" />
> >     </deadLetterStrategy>
> > </policyEntry>
> >
> > - consumers consume messages fairly slowly compared to other destinations
> > (about 50-100ms per message compared to
> > other consumers for other destinations- about 10-30ms per message)
> >
> > - however, it seems we end up in a situation, where the consumers are not
> > consuming with the speed we expect them to be doing, and seem to wait for
> > something, while there is a huge load of messages on the remote broker
> for
> > that destination. The consumers seem to also not be neither CPU, nor IO
> > bound, nor network traffic bound.
> >
> > - a symptom is that if we split that queue to two queues and we attach
> the
> > same number of consumers in the same number of nodes to consume it,
> things
> > are somehow becoming better. Also, if there is a huge workload for that
> > queue, if we just rename it to suchQueue2 on producers, and assign some
> > consumers on it, these consumers are much faster (for a while) than the
> > consumers on the "old" suchQueue.
> >
> > - the queue doesn't have "non-grouped messages", all messages on it have
> > the
> > JMSXGroupID property and are of the same type.
> >
> > - increasing the number of consumers or lowering it for that queue seems
> to
> > have little effect
> >
> > - rebooting the consumer apps seems to have little effect once the queue
> > becomes "slow to consume"
> >
> >
> > Has anybody experienced this:
> >
> > in short:
> >
> > Broker is waiting a considerable time for the consumers who seem to be
> free
> > and not busy all the time.
> >
> >
> >
> > --
> > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> > f2341805.html
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Amq 5.15.2 slow queues with JMSXGroupID

artnaseef
In reply to this post by n.yovchev
One possible (and common) cause of slowness on queues with message groups is
the Max Page Size limitation.  If there are <max-page-size> number of
messages at the head of the queue that cannot be dispatched for any reason
(lack of consumers available, all consumers prefetch buffers are full, ...)
then no subsequent messages will be delivered from the queue until some of
those messages at the head are consumed.

That would explain why splitting the queue helps.

When this happens, it would help to know the queue stats - including the
number of inflight messages, the number of messages on the queue, and the
setting for max page size.

IIRC, there's debug logging that can be turned on to show what's happening
in the dispatch loop.

Note that, if this is the problem, it may be tempting to simply increase Max
Page Size, but that has some adverse affects as well.

Hope this helps.

Art



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Amq 5.15.2 slow queues with JMSXGroupID

n.yovchev
In reply to this post by n.yovchev
Hi guys, thanks for the suggestions.


@paulgale

My understanding is that if we are using a simpleMessageGroupMapFactory, we
don't have a limit of the message groups:

<messageGroupMapFactory>
     <simpleMessageGroupMapFactory/>
</messageGroupMapFactory>

So could that be the problem still, we also don't see a huge memory usage in
the broker so it's definitely not leaking there.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html