Artemis max performance capped at ~200MB/s

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Artemis max performance capped at ~200MB/s

schalmers
This post was updated on .
Is there any reason why running some Artemis performance tests I cannot get
more than ~200MB/s (1.6 gbits/s) through an Artemis v2.6.3 broker running on
a Amazon Linux VM with 16 vCPU, 32GB RAM, in a master-slave setup?

FYI, the bandwidth available to this host is 5gbit/s external network and 3.5 gbit/s to the storage network/layer (EBS).

If I stop the slave OR run a single master, I get ~430MB/s - which is close to the 3.5 gbit/s max to the storage network/layer provided by EBS which I'd expect.

Unfortunately with a master-slave setup and stopping the slave, the master broker eventually stops as per this JIRA: https://issues.apache.org/jira/browse/ARTEMIS-2180

So it appears there is a performance issue with the master-slave replication, perhaps.

Running iperf3 test gets me around the ~5gbit/s mark.

Config:

artemis.profile:

JAVA_ARGS=" -XX:+PrintClassHistogram -XX:+UseG1GC -XX:+AggressiveOpts -Xms512M -Xmx24G -Dhawtio.realm=activemq  -Dhawtio.offline="true" -Dhawtio.role=amq -Dhawtio.rolePrincipalClasses=org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal -Djolokia.policyLocation=${ARTEMIS_INSTANCE_ETC_URI}jolokia-access.xml"


broker.xml:

      <name>node1-a</name>

      <persistence-enabled>true</persistence-enabled>

      <journal-type>MAPPED</journal-type>
      <journal-datasync>false</journal-datasync>
      <journal-sync-non-transactional>false</journal-sync-non-transactional>
      <journal-sync-transactional>false</journal-sync-transactional>
      <journal-min-files>10</journal-min-files>
      <journal-pool-files>50</journal-pool-files>
      <journal-compact-min-files>100</journal-compact-min-files>
      <journal-file-size>100M</journal-file-size>
      <journal-buffer-timeout>0</journal-buffer-timeout>
      <journal-max-io>1</journal-max-io>
      <paging-directory>data/paging</paging-directory>
      <bindings-directory>data/bindings</bindings-directory>
      <journal-directory>data/journal</journal-directory>
      <large-messages-directory>data/large-messages</large-messages-directory>
      <network-check-period>10000</network-check-period>
      <network-check-timeout>1000</network-check-timeout>
      <network-check-list>172.31.0.1</network-check-list>
      <disk-scan-period>5000</disk-scan-period>
      <max-disk-usage>90</max-disk-usage>
      <critical-analyzer>true</critical-analyzer>
      <critical-analyzer-timeout>120000</critical-analyzer-timeout>
      <critical-analyzer-check-period>60000</critical-analyzer-check-period>
      <critical-analyzer-policy>HALT</critical-analyzer-policy>

      <acceptors>
        <acceptor name="amqp-acceptor">tcp://172.31.10.32:5672?tcpSendBufferSize=2097152;tcpReceiveBufferSize=2097152;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=500;direct-deliver=false</acceptor>
        <acceptor name="netty-acceptor">tcp://172.31.10.32:61616</acceptor>
      </acceptors>

      <ha-policy>
        <replication>
          <master>
            <check-for-live-server>true</check-for-live-server>
          </master>
       </replication>
      </ha-policy>

      <cluster-user>user</cluster-user>
      <cluster-password>password</cluster-password>

      <connectors>
         <connector name="cluster-connector1">tcp://172.31.10.32:61616</connector>
         <connector name="cluster-connector2">tcp://172.31.30.42:61616</connector>
      </connectors>

      <cluster-connections>
         <cluster-connection name="cluster1">
            <connector-ref>cluster-connector1</connector-ref>
            <retry-interval>500</retry-interval>
            <!--<use-duplicate-detection>true</use-duplicate-detection>-->
            <message-load-balancing>OFF</message-load-balancing>
            <max-hops>1</max-hops>
            <static-connectors>
                <connector-ref>cluster-connector2</connector-ref>
            </static-connectors>
         </cluster-connection>
      </cluster-connections>
Reply | Threaded
Open this post in threaded view
|

Re: Artemis max performance capped at ~200MB/s

jbertram
None of the XML you pasted is visible in the email (although it is visible
if I use the Nabble link).

In general there will be a performance hit from replication.  That's one of
the trade-offs between replication and shared-storage.  Using
shared-storage the data only has to be written once, but with replication
the data has to be written on the live and then sent across the network
again and written on the backup as well.  Whether or not you're seeing the
expected performance hit is the real question, and I'm not sure how to
answer that.  Do you have any metrics on the network utilization between
the live and the backup?  What happens if you run with both the live and
the backup on the same machine (obviously not something you'd do in
production, but something worth testing as a troubleshooting exercise IMO).


Justin

On Mon, Nov 26, 2018 at 1:33 AM schalmers <
[hidden email]> wrote:

> Is there any reason why running some Artemis performance tests I cannot get
> more than ~200MB/s (1.6 gbits/s) through an Artemis v2.6.3 broker running
> on
> a Amazon Linux VM with 16 vCPU, 32GB RAM?
>
> artemis.profile:
>
>
>
> broker.xml:
>
>
>
>
>
> --
> Sent from:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Artemis max performance capped at ~200MB/s

schalmers
jbertram wrote
>  Do you have any metrics on the network utilization between
> the live and the backup?

In terms of the network utilization between the live and the backup, using
iperf3 I get these results:

[ec2-user@ip-172-31-30-42 ~]$ iperf3 -c 172.31.10.32 -p 5672
Connecting to host 172.31.10.32, port 5672
[  4] local 172.31.30.42 port 51508 connected to 172.31.10.32 port 5672
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   589 MBytes  4.94 Gbits/sec    0   1.16 MBytes
[  4]   1.00-2.00   sec   592 MBytes  4.97 Gbits/sec    0   1.23 MBytes
[  4]   2.00-3.00   sec   592 MBytes  4.97 Gbits/sec    0   1.23 MBytes
[  4]   3.00-4.00   sec   594 MBytes  4.98 Gbits/sec    0   1.23 MBytes
[  4]   4.00-5.00   sec   592 MBytes  4.97 Gbits/sec    0   1.23 MBytes
[  4]   5.00-6.00   sec   592 MBytes  4.97 Gbits/sec    0   1.41 MBytes
[  4]   6.00-7.00   sec   594 MBytes  4.98 Gbits/sec    0   1.41 MBytes
[  4]   7.00-8.00   sec   592 MBytes  4.97 Gbits/sec    0   1.41 MBytes
[  4]   8.00-9.00   sec   592 MBytes  4.97 Gbits/sec    0   1.41 MBytes
[  4]   9.00-10.00  sec   594 MBytes  4.98 Gbits/sec    0   1.41 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  5.79 GBytes  4.97 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  5.78 GBytes  4.97 Gbits/sec                
receiver



jbertram wrote
>  What happens if you run with both the live and
> the backup on the same machine (obviously not something you'd do in
> production, but something worth testing as a troubleshooting exercise
> IMO).

Doing so, I get ~400MB/s (~3.2gbit/s) which is expected, as it is hitting
the storage network limit of 3.5gbit/s as provided by AWS.






--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html