Quantcast

Apache Artemis - Stress-test time

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Apache Artemis - Stress-test time

francesco81
Hi all,

we're currently performing some stress-test sessions against our Apache Artemis instance.
To do this, we have developed a java program which uses the paho library. We are simulating (just to start) 1000 subscriber clients each one who connects in qos2 and subscribes 50 specific topics dedicated to him, and other 1000 publisher clients each one who connects in qos2 and  publishes a retained message every second on a specific topic inside the range of the 50 of its own dedicated subscriber.


Hence, for every publisher there's a subscriber: each subscriber subscribes 50 topics. Each publisher publishes on a single specific topic. The total amount of topics (addresses) involved are 1000x50 = 50000.


Our Artemis instance runs on a Centos 7 Server (64bit) with 8GB of RAM (4GB Xms and Xmx heap) and 4 vCpu. Max number of processes for the artemis process is 4096. Max open files are 64000.


We are able to start 1000 publishers and 1000 subscribers. Cpu is fine. Memory increases time by time. At a certain point (after about 5 mins) heap goes full and Artemis starts to page... and this is its death! When the broker starts to page, it becomes unusable. Almost all connected clients are disconnected, ram is never more freed and the file system partition used for paging (no SSD unfortunately) increases time by time since it reaches 100%.


As I said, Artemis works well until it starts to page. Exactly in the moment I see the following lines inside log:

...

AMQ222038: Starting paging on address '.cro.plantid.1075528.gwid.50437.device_state'; size is

currently: 1,521 bytes; max-size-bytes: -1

AMQ222038: Starting paging on address '.cro.plantid.1075520.gwid.50451.device_state'; size is currently: 1,521 bytes; max-size-bytes: -1

...

AMQ222038: Starting paging on address '$sys.mqtt.queue.qos2.50687s'; size is currently: 107,379 bytes; max-size-bytes: -1


It seems that addresses related to the topics (like "cro.plantid.1075520.gwid.50451.device_state") are always of the same size (I think because there are always only a publisher and a subscriber and on topics are sent only retained messages, where the last one replace the previous one).

Instead, "system" addresses (like "$sys.mqtt.queue.qos2.50687s") grow constantly.


Please, could someone explain me how to manage system addresses and how to do in order they don't grow so much? maybe I'm wrong somewhere inside the configuration? Which is the better configuration for my specific case (the one we are testing as above)?


Thanks so much in advance, guys.


Francesco


________________________________

Ce message, ainsi que tous les fichiers joints à ce message, peuvent contenir des informations sensibles et/ ou confidentielles ne devant pas être divulguées. Si vous n'êtes pas le destinataire de ce message (ou que vous recevez ce message par erreur), nous vous remercions de le notifier immédiatement à son expéditeur, et de détruire ce message. Toute copie, divulgation, modification, utilisation ou diffusion, non autorisée, directe ou indirecte, de tout ou partie de ce message, est strictement interdite.


This e-mail, and any document attached hereby, may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized, direct or indirect, copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

clebertsuconic
Well.  I always say you can't beat physics.  When u start to page that many
destinations you are doing a lot more IO. You can't make Artemis to go
faster than the physical barriers you have on disks.

You should probably use more memory.  Bigger global size.


Also look at the Tip of master (upcoming 2.0.1) as I committed a few
improvements.



On Wed, Mar 22, 2017 at 1:39 PM Francesco PADOVANI <
[hidden email]> wrote:

> Hi all,
>
> we're currently performing some stress-test sessions against our Apache
> Artemis instance.
> To do this, we have developed a java program which uses the paho library.
> We are simulating (just to start) 1000 subscriber clients each one who
> connects in qos2 and subscribes 50 specific topics dedicated to him, and
> other 1000 publisher clients each one who connects in qos2 and  publishes a
> retained message every second on a specific topic inside the range of the
> 50 of its own dedicated subscriber.
>
>
> Hence, for every publisher there's a subscriber: each subscriber
> subscribes 50 topics. Each publisher publishes on a single specific topic.
> The total amount of topics (addresses) involved are 1000x50 = 50000.
>
>
> Our Artemis instance runs on a Centos 7 Server (64bit) with 8GB of RAM
> (4GB Xms and Xmx heap) and 4 vCpu. Max number of processes for the artemis
> process is 4096. Max open files are 64000.
>
>
> We are able to start 1000 publishers and 1000 subscribers. Cpu is fine.
> Memory increases time by time. At a certain point (after about 5 mins) heap
> goes full and Artemis starts to page... and this is its death! When the
> broker starts to page, it becomes unusable. Almost all connected clients
> are disconnected, ram is never more freed and the file system partition
> used for paging (no SSD unfortunately) increases time by time since it
> reaches 100%.
>
>
> As I said, Artemis works well until it starts to page. Exactly in the
> moment I see the following lines inside log:
>
> ...
>
> AMQ222038: Starting paging on address
> '.cro.plantid.1075528.gwid.50437.device_state'; size is
>
> currently: 1,521 bytes; max-size-bytes: -1
>
> AMQ222038: Starting paging on address
> '.cro.plantid.1075520.gwid.50451.device_state'; size is currently: 1,521
> bytes; max-size-bytes: -1
>
> ...
>
> AMQ222038: Starting paging on address '$sys.mqtt.queue.qos2.50687s'; size
> is currently: 107,379 bytes; max-size-bytes: -1
>
>
> It seems that addresses related to the topics (like
> "cro.plantid.1075520.gwid.50451.device_state") are always of the same size
> (I think because there are always only a publisher and a subscriber and on
> topics are sent only retained messages, where the last one replace the
> previous one).
>
> Instead, "system" addresses (like "$sys.mqtt.queue.qos2.50687s") grow
> constantly.
>
>
> Please, could someone explain me how to manage system addresses and how to
> do in order they don't grow so much? maybe I'm wrong somewhere inside the
> configuration? Which is the better configuration for my specific case (the
> one we are testing as above)?
>
>
> Thanks so much in advance, guys.
>
>
> Francesco
>
>
> ________________________________
>
> Ce message, ainsi que tous les fichiers joints à ce message, peuvent
> contenir des informations sensibles et/ ou confidentielles ne devant pas
> être divulguées. Si vous n'êtes pas le destinataire de ce message (ou que
> vous recevez ce message par erreur), nous vous remercions de le notifier
> immédiatement à son expéditeur, et de détruire ce message. Toute copie,
> divulgation, modification, utilisation ou diffusion, non autorisée, directe
> ou indirecte, de tout ou partie de ce message, est strictement interdite.
>
>
> This e-mail, and any document attached hereby, may contain confidential
> and/or privileged information. If you are not the intended recipient (or
> have received this e-mail in error) please notify the sender immediately
> and destroy this e-mail. Any unauthorized, direct or indirect, copying,
> disclosure, distribution or other use of the material or parts thereof is
> strictly forbidden.
>
--
Clebert Suconic
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

clebertsuconic
In reply to this post by francesco81
it's MQTT.. I would recommend you using block for these address
though.. unless you really need paging.

or a bigger maxSize and bigger memory.

On Wed, Mar 22, 2017 at 1:39 PM, Francesco PADOVANI
<[hidden email]> wrote:

> Hi all,
>
> we're currently performing some stress-test sessions against our Apache Artemis instance.
> To do this, we have developed a java program which uses the paho library. We are simulating (just to start) 1000 subscriber clients each one who connects in qos2 and subscribes 50 specific topics dedicated to him, and other 1000 publisher clients each one who connects in qos2 and  publishes a retained message every second on a specific topic inside the range of the 50 of its own dedicated subscriber.
>
>
> Hence, for every publisher there's a subscriber: each subscriber subscribes 50 topics. Each publisher publishes on a single specific topic. The total amount of topics (addresses) involved are 1000x50 = 50000.
>
>
> Our Artemis instance runs on a Centos 7 Server (64bit) with 8GB of RAM (4GB Xms and Xmx heap) and 4 vCpu. Max number of processes for the artemis process is 4096. Max open files are 64000.
>
>
> We are able to start 1000 publishers and 1000 subscribers. Cpu is fine. Memory increases time by time. At a certain point (after about 5 mins) heap goes full and Artemis starts to page... and this is its death! When the broker starts to page, it becomes unusable. Almost all connected clients are disconnected, ram is never more freed and the file system partition used for paging (no SSD unfortunately) increases time by time since it reaches 100%.
>
>
> As I said, Artemis works well until it starts to page. Exactly in the moment I see the following lines inside log:
>
> ...
>
> AMQ222038: Starting paging on address '.cro.plantid.1075528.gwid.50437.device_state'; size is
>
> currently: 1,521 bytes; max-size-bytes: -1
>
> AMQ222038: Starting paging on address '.cro.plantid.1075520.gwid.50451.device_state'; size is currently: 1,521 bytes; max-size-bytes: -1
>
> ...
>
> AMQ222038: Starting paging on address '$sys.mqtt.queue.qos2.50687s'; size is currently: 107,379 bytes; max-size-bytes: -1
>
>
> It seems that addresses related to the topics (like "cro.plantid.1075520.gwid.50451.device_state") are always of the same size (I think because there are always only a publisher and a subscriber and on topics are sent only retained messages, where the last one replace the previous one).
>
> Instead, "system" addresses (like "$sys.mqtt.queue.qos2.50687s") grow constantly.
>
>
> Please, could someone explain me how to manage system addresses and how to do in order they don't grow so much? maybe I'm wrong somewhere inside the configuration? Which is the better configuration for my specific case (the one we are testing as above)?
>
>
> Thanks so much in advance, guys.
>
>
> Francesco
>
>
> ________________________________
>
> Ce message, ainsi que tous les fichiers joints à ce message, peuvent contenir des informations sensibles et/ ou confidentielles ne devant pas être divulguées. Si vous n'êtes pas le destinataire de ce message (ou que vous recevez ce message par erreur), nous vous remercions de le notifier immédiatement à son expéditeur, et de détruire ce message. Toute copie, divulgation, modification, utilisation ou diffusion, non autorisée, directe ou indirecte, de tout ou partie de ce message, est strictement interdite.
>
>
> This e-mail, and any document attached hereby, may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized, direct or indirect, copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden.



--
Clebert Suconic
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

Martyn Taylor
Francesco,

I think you've stumbled upon a bug.  In order to support the Exact;y Once 4
way ack for MQTT we schedule pub rel messages on a management queue (in
case we need to resend).  It appears these were not getting removed from
the queue after the broker got the PubComp back from the client, which is
why you're seeing the management queue size grow.

I've opened a JIRA [1] and sent a fix [2].

1. https://issues.apache.org/jira/browse/ARTEMIS-1061
2. https://github.com/apache/activemq-artemis/pull/1122

Thanks
Martyn

On Wed, Mar 22, 2017 at 6:55 PM, Clebert Suconic <[hidden email]>
wrote:

> it's MQTT.. I would recommend you using block for these address
> though.. unless you really need paging.
>
> or a bigger maxSize and bigger memory.
>
> On Wed, Mar 22, 2017 at 1:39 PM, Francesco PADOVANI
> <[hidden email]> wrote:
> > Hi all,
> >
> > we're currently performing some stress-test sessions against our Apache
> Artemis instance.
> > To do this, we have developed a java program which uses the paho
> library. We are simulating (just to start) 1000 subscriber clients each one
> who connects in qos2 and subscribes 50 specific topics dedicated to him,
> and other 1000 publisher clients each one who connects in qos2 and
> publishes a retained message every second on a specific topic inside the
> range of the 50 of its own dedicated subscriber.
> >
> >
> > Hence, for every publisher there's a subscriber: each subscriber
> subscribes 50 topics. Each publisher publishes on a single specific topic.
> The total amount of topics (addresses) involved are 1000x50 = 50000.
> >
> >
> > Our Artemis instance runs on a Centos 7 Server (64bit) with 8GB of RAM
> (4GB Xms and Xmx heap) and 4 vCpu. Max number of processes for the artemis
> process is 4096. Max open files are 64000.
> >
> >
> > We are able to start 1000 publishers and 1000 subscribers. Cpu is fine.
> Memory increases time by time. At a certain point (after about 5 mins) heap
> goes full and Artemis starts to page... and this is its death! When the
> broker starts to page, it becomes unusable. Almost all connected clients
> are disconnected, ram is never more freed and the file system partition
> used for paging (no SSD unfortunately) increases time by time since it
> reaches 100%.
> >
> >
> > As I said, Artemis works well until it starts to page. Exactly in the
> moment I see the following lines inside log:
> >
> > ...
> >
> > AMQ222038: Starting paging on address '.cro.plantid.1075528.gwid.50437.device_state';
> size is
> >
> > currently: 1,521 bytes; max-size-bytes: -1
> >
> > AMQ222038: Starting paging on address '.cro.plantid.1075520.gwid.50451.device_state';
> size is currently: 1,521 bytes; max-size-bytes: -1
> >
> > ...
> >
> > AMQ222038: Starting paging on address '$sys.mqtt.queue.qos2.50687s';
> size is currently: 107,379 bytes; max-size-bytes: -1
> >
> >
> > It seems that addresses related to the topics (like
> "cro.plantid.1075520.gwid.50451.device_state") are always of the same
> size (I think because there are always only a publisher and a subscriber
> and on topics are sent only retained messages, where the last one replace
> the previous one).
> >
> > Instead, "system" addresses (like "$sys.mqtt.queue.qos2.50687s") grow
> constantly.
> >
> >
> > Please, could someone explain me how to manage system addresses and how
> to do in order they don't grow so much? maybe I'm wrong somewhere inside
> the configuration? Which is the better configuration for my specific case
> (the one we are testing as above)?
> >
> >
> > Thanks so much in advance, guys.
> >
> >
> > Francesco
> >
> >
> > ________________________________
> >
> > Ce message, ainsi que tous les fichiers joints à ce message, peuvent
> contenir des informations sensibles et/ ou confidentielles ne devant pas
> être divulguées. Si vous n'êtes pas le destinataire de ce message (ou que
> vous recevez ce message par erreur), nous vous remercions de le notifier
> immédiatement à son expéditeur, et de détruire ce message. Toute copie,
> divulgation, modification, utilisation ou diffusion, non autorisée, directe
> ou indirecte, de tout ou partie de ce message, est strictement interdite.
> >
> >
> > This e-mail, and any document attached hereby, may contain confidential
> and/or privileged information. If you are not the intended recipient (or
> have received this e-mail in error) please notify the sender immediately
> and destroy this e-mail. Any unauthorized, direct or indirect, copying,
> disclosure, distribution or other use of the material or parts thereof is
> strictly forbidden.
>
>
>
> --
> Clebert Suconic
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

francesco81
Hi Martyn,
you are a kind of super hero to us!

I'm going to build your fix and try it with our stress tests.
I'll give you a feedback.

Thanks a lot as usual!

Francesco
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

Martyn Taylor
Glad to be of help.  Please let me know how you guys get on with the latest
fix.

On Thu, Mar 23, 2017 at 3:43 PM, francesco81 <[hidden email]>
wrote:

> Hi Martyn,
> you are a kind of super hero to us!
>
> I'm going to build your fix and try it with our stress tests.
> I'll give you a feedback.
>
> Thanks a lot as usual!
>
> Francesco
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Apache-Artemis-Stress-test-time-tp4723999p4724064.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

francesco81
Hi Martyn,
after a while of stress-test sessions with the new 2.1.0-SNAPSHOT of Artemis, I'm fairly confident to say that ram consumption now seems ok.
Currently, instead, we're facing some problems with cpu usage during clients connections.
Our stress test program, at the beginning, initializes 1500 subscriber instances: then try to connects them all, at a rate of one every 500 ms.
Clients connect using tls (only server side, not mutual auth) with user, passwd and clientid.
For every client, our Artemis authenticates it by matching user and passwd on a redis backend and loading the user's role in session.
After the connection succeeds, each client subscribes 50 topics. Here Artemis authorizes the subscription, by checking the user role and related permissions on topics: this information are loaded from redis at the startup and refreshed every time there's a change on redis (e.g. if a new user or role is added).

The server we're using (where we installed Artemis) is an aws ec2 instance (m4.xlarge) with the following properties:
vCPU = 4 (Intel Xeon 2.40GHz)
RAM = 16GB
Disk = SSD
Bandwidth = 750 Mbps
OS = CentOS Linux release 7.3.1611
Kernel = 3.10.0-514.2.2.el7.x86_64

What we notice is that during clients' connection the CPU of our machine goes to 100%. 400% considering 4 vCPU (almost 380% used by Artemis):

top - 10:49:32 up 6 days,  1:03,  1 user,  load average: 12.16, 11.93, 11.12
Tasks: 105 total,   4 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu0  : 99.0 us,  0.0 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 99.0 us,  1.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 99.0 us,  0.0 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 97.0 us,  1.0 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem : 16004820 total,  8214776 free,  7183864 used,   606180 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  8439988 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                        
16015 mqtt      20   0 12.637g 6.641g  15784 S 378.2 43.5 221:00.54 java                                                                                          
19229 mqtt      20   0  138964   3832   1212 R  15.8  0.0  14:49.52 redis-server

This results in a lot of "Connection timeOut" at client side. Usually are needed about 40 minutes before going to the regime, hence before that all 1500 subscribers have been all able to get their connections. And even after reaching the regime, for a certain time the cpu usage remain at 100%, before to decrease.
Is this normal? There are no errors in artemis.log.
I've the suspect to have misconfigured something at Artemis side. Attached there's my broker.xml. Maybe you can help me to understand where I'm wrong.

Thanks in advance for your help as usual,

Francescobroker.xml
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

jbertram
Have you tried your test without SSL?  SSL can be pretty CPU intensive.


Justin

----- Original Message -----
From: "francesco81" <[hidden email]>
To: [hidden email]
Sent: Wednesday, April 5, 2017 6:57:38 AM
Subject: Re: Apache Artemis - Stress-test time

Hi Martyn,
after a while of stress-test sessions with the new 2.1.0-SNAPSHOT of
Artemis, I'm fairly confident to say that ram consumption now seems ok.
Currently, instead, we're facing some problems with cpu usage during clients
connections.
Our stress test program, at the beginning, initializes 1500 subscriber
instances: then try to connects them all, at a rate of one every 500 ms.
Clients connect using tls (only server side, not mutual auth) with user,
passwd and clientid.
For every client, our Artemis authenticates it by matching user and passwd
on a redis backend and loading the user's role in session.
After the connection succeeds, each client subscribes 50 topics. Here
Artemis authorizes the subscription, by checking the user role and related
permissions on topics: this information are loaded from redis at the startup
and refreshed every time there's a change on redis (e.g. if a new user or
role is added).

The server we're using (where we installed Artemis) is an aws ec2 instance
(m4.xlarge) with the following properties:
vCPU = 4 (Intel Xeon 2.40GHz)
RAM = 16GB
Disk = SSD
Bandwidth = 750 Mbps
OS = CentOS Linux release 7.3.1611
Kernel = 3.10.0-514.2.2.el7.x86_64

What we notice is that during clients' connection the CPU of our machine
goes to 100%. 400% considering 4 vCPU (almost 380% used by Artemis):

top - 10:49:32 up 6 days,  1:03,  1 user,  load average: 12.16, 11.93, 11.12
Tasks: 105 total,   4 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu0  : 99.0 us,  0.0 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu1  : 99.0 us,  1.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  : 99.0 us,  0.0 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  : 97.0 us,  1.0 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0
st
KiB Mem : 16004820 total,  8214776 free,  7183864 used,   606180 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  8439988 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                        
16015 mqtt      20   0 12.637g 6.641g  15784 S 378.2 43.5 221:00.54 java                                                                                          
19229 mqtt      20   0  138964   3832   1212 R  15.8  0.0  14:49.52
redis-server

This results in a lot of "Connection timeOut" at client side. Usually are
needed about 40 minutes before going to the regime, hence before that all
1500 subscribers have been all able to get their connections. And even after
reaching the regime, for a certain time the cpu usage remain at 100%, before
to decrease.
Is this normal? There are no errors in artemis.log.
I've the suspect to have misconfigured something at Artemis side. Attached
there's my broker.xml. Maybe you can help me to understand where I'm wrong.

Thanks in advance for your help as usual,

Francesco broker.xml
<http://activemq.2283324.n4.nabble.com/file/n4724552/broker.xml>  



--
View this message in context: http://activemq.2283324.n4.nabble.com/Apache-Artemis-Stress-test-time-tp4723999p4724552.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

francesco81
Hi Justin,
unfortunately tls is mandatory for our use case :-(
I can make a stress test session without it, but it wouldn't be so significative for the real case.
Anyway, as you suggest, I'll try it just to see the difference in term of performances.

Thanks
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

jbertram
I understand that you need TLS for your use-case.  My desire is simply to determine what performance impact it is having.  If SSL is your performance bottleneck then there's not a lot (if anything) the broker can change to increase performance.

In any case, going much deeper with performance analysis is probably going to require profiling to identify hot-spots in the code.  Are you set up for this kind of work?


Justin

----- Original Message -----
From: "francesco81" <[hidden email]>
To: [hidden email]
Sent: Wednesday, April 5, 2017 9:30:04 AM
Subject: Re: Apache Artemis - Stress-test time

Hi Justin,
unfortunately tls is mandatory for our use case :-(
I can make a stress test session without it, but it wouldn't be so
significative for the real case.
Anyway, as you suggest, I'll try it just to see the difference in term of
performances.

Thanks



--
View this message in context: http://activemq.2283324.n4.nabble.com/Apache-Artemis-Stress-test-time-tp4723999p4724573.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

francesco81
Hi Justin,
we've tried by turning off tls. It doesn't change: cpu usage remain very high.
We've tried also by disabling persistence. It changes a bit: at the beginning of the connection phase, cpu goes to 50% ...then it increases time by time til it reaches 100% when there are about 1000 connected clients.
Now we're going to add some logs to our scecurity plugin, to see the cpu consumption for every involved thread.
Meanwhile I'm going to install and use Sun Oracle JDK 8, instead of the current OpenJDK 8, but simply to have jmap, jstack and jcmd utilities to make some additional analysis (I can't find them inside the standard OpenJDK package for Centos7).

I'll let you know for further results.
Of course, any suggestion to make a better performance analysis is appreciated.

Thanks a lot.

Francesco
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

jbertram
My advice is to simplify your environment, application, and broker configuration as much as possible while still reproducing the high CPU utilization.  If you can do it with only local resources (e.g. everything running on a laptop) then that's even better.  The simpler and easier it is to reproduce the faster (and more accurately) the problem can be analyzed and resolved.

After simplifying, the first thing would be to determine if the high CPU is being caused by garbage collection or something else.  You can monitor heap utilization with tools like JConsole and VisualVM.  Timely thread dumps are a great way to see what's consuming CPU within the broker itself.

Good luck.


Justin

----- Original Message -----
From: "francesco81" <[hidden email]>
To: [hidden email]
Sent: Thursday, April 6, 2017 10:08:03 AM
Subject: Re: Apache Artemis - Stress-test time

Hi Justin,
we've tried by turning off tls. It doesn't change: cpu usage remain very
high.
We've tried also by disabling persistence. It changes a bit: at the
beginning of the connection phase, cpu goes to 50% ...then it increases time
by time til it reaches 100% when there are about 1000 connected clients.
Now we're going to add some logs to our scecurity plugin, to see the cpu
consumption for every involved thread.
Meanwhile I'm going to install and use Sun Oracle JDK 8, instead of the
current OpenJDK 8, but simply to have jmap, jstack and jcmd utilities to
make some additional analysis (I can't find them inside the standard OpenJDK
package for Centos7).

I'll let you know for further results.
Of course, any suggestion to make a better performance analysis is
appreciated.

Thanks a lot.

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Apache-Artemis-Stress-test-time-tp4723999p4724632.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

francesco81
Hi Justin,
just an info: by analyzing the log file, it seems that Artemis makes calls to the login methods (initialize, login and commit, defined in the native interface of Artemis LoginModule), not only at the login, but also during subscribtion and unsubscription phases.
We saw this with a clean Artemis installation (as you suggested). And it seems to have a certain impact on cpu usage.
Is this a normal behavior (if yes, something escapes me)? Or it could be a bug?

Francesco
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

clebertsuconic
When you say logging, do you mean the journal?

Logging shouldn't really be an issue unless you're using tracing.


Can you show a thread dump at the time of this happening? a few thread
dumps will usually tell you where most of the CPU is spent if they all
show the same threads under usage.

On Fri, Apr 7, 2017 at 6:54 AM, francesco81
<[hidden email]> wrote:

> Hi Justin,
> just an info: by analyzing the log file, it seems that Artemis makes calls
> to the login methods (initialize, login and commit, defined in the native
> interface of Artemis LoginModule), not only at the login, but also during
> subscribtion and unsubscription phases.
> We saw this with a clean Artemis installation (as you suggested). And it
> seems to have a certain impact on cpu usage.
> Is this a normal behavior (if yes, something escapes me)? Or it could be a
> bug?
>
> Francesco
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Apache-Artemis-Stress-test-time-tp4723999p4724710.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.



--
Clebert Suconic
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

jbertram
In reply to this post by francesco81
I would expect that Artemis would call the LoginModule for those things and any other place where authentication or authorization is necessary. However, there is a cache in place for these checks controlled by <security-invalidation-interval> which will optimize checks for the same user/role/permission.  Of course, when the server first starts the cache is empty so it will need to be initialized.

It's also worth noting that the ProperiesLoginModule will read the users and roles from disk once and hold those values in memory unless/until they are changed on disk and reloading has been configured in login.config (which it is by default) at which point the values will be read from disk again.

In any event, my recommendation would be to turn off security (i.e. using <security-enabled>false</security-enabled>) and then test again to see if that's a real bottleneck.


Justin

----- Original Message -----
From: "francesco81" <[hidden email]>
To: [hidden email]
Sent: Friday, April 7, 2017 5:54:39 AM
Subject: Re: Apache Artemis - Stress-test time

Hi Justin,
just an info: by analyzing the log file, it seems that Artemis makes calls
to the login methods (initialize, login and commit, defined in the native
interface of Artemis LoginModule), not only at the login, but also during
subscribtion and unsubscription phases.
We saw this with a clean Artemis installation (as you suggested). And it
seems to have a certain impact on cpu usage.
Is this a normal behavior (if yes, something escapes me)? Or it could be a
bug?

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Apache-Artemis-Stress-test-time-tp4723999p4724710.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

francesco81
In reply to this post by clebertsuconic
Hi Clebert,
my logging is at the DEBUG level. Inside the artemis.log file, I see calls to the LoginModule methods when client connects, subscribes and unsubscribes.
Following the thread trace when it happens during subscription:

"Thread-1 (activemq-netty-threads-1239132915)" #47 daemon prio=5 os_prio=0 tid=0x00007f629c016000 nid=0x2066 at breakpoint[0x00007f62806e0000]   java.lang.Thread.State: RUNNABLE at com.bticino.artemis.RedisLoginModule.commit(RedisLoginModule.java:228) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Redefined) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:588) at org.apache.activemq.artemis.spi.core.security.ActiveMQJAASSecurityManager.getAuthenticatedSubject(ActiveMQJAASSecurityManager.java:185) at org.apache.activemq.artemis.spi.core.security.ActiveMQJAASSecurityManager.validateUserAndRole(ActiveMQJAASSecurityManager.java:132) at org.apache.activemq.artemis.core.security.impl.SecurityStoreImpl.check(SecurityStoreImpl.java:180) at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.securityCheck(ServerSessionImpl.java:404) at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.createConsumer(ServerSessionImpl.java:440) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.createConsumerForSubscriptionQueue(MQTTSubscriptionManager.java:154) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.addSubscription(MQTTSubscriptionManager.java:174) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.addSubscriptions(MQTTSubscriptionManager.java:221) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.handleSubscribe(MQTTProtocolHandler.java:250) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.channelRead(MQTTProtocolHandler.java:132) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:651) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:574) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:488) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:450) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) at java.lang.Thread.run(Thread.java:745)

Francesco
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

jbertram
In my previous message I referred to the PropertiesLoginModule, but I can see now that doesn't apply here since you're using a custom login module (i.e. com.bticino.artemis.RedisLoginModule).  You may want to investigate more closely how the login module is working and if Redis itself might be a bottleneck here.

As I mentioned before, it would be simple just to disable security to see what impact your login module is having.


Justin

----- Original Message -----
From: "francesco81" <[hidden email]>
To: [hidden email]
Sent: Friday, April 7, 2017 9:18:11 AM
Subject: Re: Apache Artemis - Stress-test time

Hi Clebert,
my logging is at the DEBUG level. Inside the artemis.log file, I see calls
to the LoginModule methods when client connects, subscribes and
unsubscribes.
Following the thread trace when it happens during subscription:

"Thread-1 (activemq-netty-threads-1239132915)" #47 daemon prio=5 os_prio=0
tid=0x00007f629c016000 nid=0x2066 at breakpoint[0x00007f62806e0000]  
java.lang.Thread.State: RUNNABLE at
com.bticino.artemis.RedisLoginModule.commit(RedisLoginModule.java:228) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Redefined) at
javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at
javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at
javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at
javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at
javax.security.auth.login.LoginContext.login(LoginContext.java:588) at
org.apache.activemq.artemis.spi.core.security.ActiveMQJAASSecurityManager.getAuthenticatedSubject(ActiveMQJAASSecurityManager.java:185)
at
org.apache.activemq.artemis.spi.core.security.ActiveMQJAASSecurityManager.validateUserAndRole(ActiveMQJAASSecurityManager.java:132)
at
org.apache.activemq.artemis.core.security.impl.SecurityStoreImpl.check(SecurityStoreImpl.java:180)
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.securityCheck(ServerSessionImpl.java:404)
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.createConsumer(ServerSessionImpl.java:440)
at
org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.createConsumerForSubscriptionQueue(MQTTSubscriptionManager.java:154)
at
org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.addSubscription(MQTTSubscriptionManager.java:174)
at
org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.addSubscriptions(MQTTSubscriptionManager.java:221)
at
org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.handleSubscribe(MQTTProtocolHandler.java:250)
at
org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.channelRead(MQTTProtocolHandler.java:132)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351)
at
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:651)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:574)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:488)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:450) at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
at java.lang.Thread.run(Thread.java:745)

Francesco




--
View this message in context: http://activemq.2283324.n4.nabble.com/Apache-Artemis-Stress-test-time-tp4723999p4724735.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

francesco81
Hi Justin,
you're right. We will investigate more closely our login module (redis itself doesn't seem to be a bottleneck ...but of course it could be the constantly querying of it).
And we'll also try to perform a stress test session with security disabled.

Just for completeness, following is a stack trace without the use of our redis plugin:

"Thread-1 (activemq-netty-threads-987595204)" #42 daemon prio=5 os_prio=0 tid=0x00007fb2d800b000 nid=0x2fb1 at breakpoint[0x00007fb2caced000]   java.lang.Thread.State: RUNNABLE at org.apache.activemq.artemis.spi.core.security.jaas.PropertiesLoginModule.login(PropertiesLoginModule.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.activemq.artemis.spi.core.security.ActiveMQJAASSecurityManager.getAuthenticatedSubject(ActiveMQJAASSecurityManager.java:185) at org.apache.activemq.artemis.spi.core.security.ActiveMQJAASSecurityManager.validateUserAndRole(ActiveMQJAASSecurityManager.java:132) at org.apache.activemq.artemis.core.security.impl.SecurityStoreImpl.check(SecurityStoreImpl.java:180) at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.securityCheck(ServerSessionImpl.java:404) at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.createConsumer(ServerSessionImpl.java:440) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.createConsumerForSubscriptionQueue(MQTTSubscriptionManager.java:154) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.addSubscription(MQTTSubscriptionManager.java:174) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.addSubscriptions(MQTTSubscriptionManager.java:221) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.handleSubscribe(MQTTProtocolHandler.java:250) at org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.channelRead(MQTTProtocolHandler.java:132) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:651) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:574) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:488) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:450) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) at java.lang.Thread.run(Thread.java:745)   Locked ownable synchr

Francesco
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

alessandro.zannini@bticino.it
Hello,
I'm a colleague of Francesco.
We just made another test , just connecting, without subscribing, in order to check if the problem is the authorization procedure... and we discovered that this step doesn't occupy a lot of cpu (never reached 5% of usage).
So we can say that the "heavy" step is the subscription procedure made after connection. For each client , we made a subscription to 50 topics (different for each clients).
In our test we subscribed 2000 clients (so, 50x2000 = 100K topics).
During this period (that has requested 1h), the CPU of broker during subscriptions was ever near 100% of usage!
Can you help us Justin, to understand why ??
Thanks.
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Artemis - Stress-test time

jbertram
> Can you help us Justin, to understand why ??

Not until you get me something more concrete like a reproducible test-case, output from a profiler indicating where the hot-spot is, or even a series of thread dumps from when CPU utilization is high.


Justin

----- Original Message -----
From: "alessandro zannini" <[hidden email]>
To: [hidden email]
Sent: Monday, April 10, 2017 9:56:19 AM
Subject: Re: Apache Artemis - Stress-test time

Hello,
I'm a colleague of Francesco.
We just made another test , just connecting, without subscribing, in order
to check if the problem is the authorization procedure... and we discovered
that this step doesn't occupy a lot of cpu (never reached 5% of usage).
So we can say that the "heavy" step is the subscription procedure made after
connection. For each client , we made a subscription to 50 topics (different
for each clients).
In our test we subscribed 2000 clients (so, 50x2000 = 100K topics).
During this period (that has requested 1h), the CPU of broker during
subscriptions was ever near 100% of usage!
Can you help us Justin, to understand why ??
Thanks.
 



--
View this message in context: http://activemq.2283324.n4.nabble.com/Apache-Artemis-Stress-test-time-tp4723999p4724785.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
12
Loading...