Artemis low message rates leading to timeouts whilst paging

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Artemis low message rates leading to timeouts whilst paging

kter
We are currently running some performance tests on Artemis. Our use case is
deploying a single Artemis node and publish a moderate number of messages
(in the order of 50K messages per second) from multiple producers. We need
to allow for durable consumer to be disconnected for up to 24hrs. We can't
allow for messages to be dropped or producers being blocked.

We observed that Artemis goes into paging mode and eventually the ingestion
rate becomes unacceptably low and producers' time out when publishing. We
think we can't avoid paging in such scenarios but we couldn't tune Artemis
to achieve a tolerable impact from paging.

Currently paging starts to become unacceptable when we reach approx. 2.56 GB
of unack'd messages using global-max-size of 6GB and page-size-bytes of -1.
Rates are dropping to below 100 msg/s and eventually result with timeouts at
the producer.

Can anyone advise on how we might stretch Artemis to our requirements and
get a more sustainable rates whilst paging is happening? Keep in mind that
due to our requirements we would see around 800GB of messages written in a
period of 24hrs



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Artemis low message rates leading to timeouts whilst paging

jbertram
> ...we couldn't tune Artemis to achieve a tolerable impact from paging.

What did you try tuning?

Did you gather any metrics from the broker during the slow-down (e.g. GC
logging, thread dumps, etc.)? If so, what did you find?

To be clear, paging is meant to be a palliative measure used sparingly
(i.e. when absolutely necessary). At 50K msgs/sec over 24 hours you'll have
over 4 billion messages in the queue and (as you note) almost 1TB of paged
data. What measures are you taking to ensure the consumers will be able to
catch up and clear that backlog? If the backlog is never cleared then the
address will likely be paging permanently which is not really the way the
broker was designed to be used.


Justin

On Fri, Nov 1, 2019 at 8:11 AM kter <[hidden email]> wrote:

> We are currently running some performance tests on Artemis. Our use case is
> deploying a single Artemis node and publish a moderate number of messages
> (in the order of 50K messages per second) from multiple producers. We need
> to allow for durable consumer to be disconnected for up to 24hrs. We can't
> allow for messages to be dropped or producers being blocked.
>
> We observed that Artemis goes into paging mode and eventually the ingestion
> rate becomes unacceptably low and producers' time out when publishing. We
> think we can't avoid paging in such scenarios but we couldn't tune Artemis
> to achieve a tolerable impact from paging.
>
> Currently paging starts to become unacceptable when we reach approx. 2.56
> GB
> of unack'd messages using global-max-size of 6GB and page-size-bytes of -1.
> Rates are dropping to below 100 msg/s and eventually result with timeouts
> at
> the producer.
>
> Can anyone advise on how we might stretch Artemis to our requirements and
> get a more sustainable rates whilst paging is happening? Keep in mind that
> due to our requirements we would see around 800GB of messages written in a
> period of 24hrs
>
>
>
> --
> Sent from:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Artemis low message rates leading to timeouts whilst paging

kter
> What did you try tuning?

We have only tuned the page sizes and number of pages (page-size-bytes,
page-max-cache-size), we have also tried adjusting the JVM -Xmx size and
global-max-size.
We saw no difference adjusting the paging values and the amount of memory we
have on our test bench is about 100 times short of what we would require.

> Did you gather any metrics from the broker during the slow-down (e.g. GC
logging, thread dumps, etc.)? If so, what did you find?

Our benchmark is set to record the producer rate and as it goes into paging
mode (indicated in Artemis log) we see a big impact on throughput. If you
could help us understand which metrics might be of interest and how can they
help us, we'd really appreciate it.

> What measures are you taking to ensure the consumers will be able to catch
> up and clear that backlog?

When re-introducing a consumer to pick up the messages we have seen it
consume messages at a rate much higher than messages are being produced,
which suggests batching being performed. So we are confident a consumer
would be able to catch up with the backlog.
However once in paging "mode" the producer seems stuck and never recovers
the original throughput.







--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Artemis low message rates leading to timeouts whilst paging

nigro_franz
I see that https://issues.apache.org/jira/browse/ARTEMIS-2466 is taking care
of making flushes of paging pending writes conigurable: now it's 300 msg/sec
and cannot be changed unless using NIO journal.
1) Which type of journal are u using?
2) Have you tried NIO or Mapped to see if is happening the some with them?

thanks,
Franz



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Artemis low message rates leading to timeouts whilst paging

kter
> 1) Which type of journal are u using?

We have configured to use NIO journal, but we used MAPPED on our previous
tests

> 2) Have you tried NIO or Mapped to see if is happening the some with them?

We've experienced same kind of performance on both



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Artemis low message rates leading to timeouts whilst paging

nigro_franz
> We have configured to use NIO journal, but we used MAPPED on our previous
tests

Just curious, why you've changed from MAPPED?

If possible, try ASYNCIO if you need disk failure guarantees
(journal-datasync = true), for performance reasons.
Choose NIO if OS won't alow to use ASYNCIO, falling back to MAPPED only if
journal-datasync is false and you just need process failures guarantees (due
to replication or other env guarantees you have).

Have you configured replication or the disk is some type of VSAN like disk?



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Artemis low message rates leading to timeouts whilst paging

wei yang
In reply to this post by kter
> Currently paging starts to become unacceptable when we reach approx. 2.56
GB
of unack'd messages using global-max-size of 6GB and page-size-bytes of
-1.

Are you using transactional message sends? Page transaction info will be
kept on broker side until cleared by consumers. I guess additional
transaction info overhead and messages themselves caused heap usage to
limit exceed and slowed down the broker. If so, you can try non
transactional sends.

kter <[hidden email]> 于2019年11月1日周五 下午9:11写道:

> We are currently running some performance tests on Artemis. Our use case is
> deploying a single Artemis node and publish a moderate number of messages
> (in the order of 50K messages per second) from multiple producers. We need
> to allow for durable consumer to be disconnected for up to 24hrs. We can't
> allow for messages to be dropped or producers being blocked.
>
> We observed that Artemis goes into paging mode and eventually the ingestion
> rate becomes unacceptably low and producers' time out when publishing. We
> think we can't avoid paging in such scenarios but we couldn't tune Artemis
> to achieve a tolerable impact from paging.
>
> Currently paging starts to become unacceptable when we reach approx. 2.56
> GB
> of unack'd messages using global-max-size of 6GB and page-size-bytes of -1.
> Rates are dropping to below 100 msg/s and eventually result with timeouts
> at
> the producer.
>
> Can anyone advise on how we might stretch Artemis to our requirements and
> get a more sustainable rates whilst paging is happening? Keep in mind that
> due to our requirements we would see around 800GB of messages written in a
> period of 24hrs
>
>
>
> --
> Sent from:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
>