AcativeMQ becomes slow suddenly in 60 minutes after starting a performance test

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AcativeMQ becomes slow suddenly in 60 minutes after starting a performance test

Hidekazu
Hi,

We did a performance test for ActiveMQ and came across a problem.
I hope someone help us understand what was wrong with my ActiveMQ.
Let me explain it here.

<problem>
Transaction time became long suddenly from 50 millsec to 60 sec in about 60 minutes
after starting a performance test.
ActiveMQ kept cpu and memory usage in a normal condition during the performance test.
However, iowait rose 10% in 60 minutes.


<environment>
Our ActiveMQ version 5.14.3

We emulated 1,000 IoT clients and sent data to ActiveMQ at one minute interval.
Payload size sent by IoT client was about 2 Kbytes.
KahaDB size is enough big. (100GB)


<KahaDB log>
There was an error and out of memory in kahaDB log.

--------------------------
2017-06-16 04:39:19,615 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 04:39:19,615 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 04:45:08,527 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 04:45:08,527 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 04:57:16,863 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 04:57:16,863 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 05:10:08,083 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 05:10:08,083 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 05:22:49,960 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 05:22:49,960 [eckpoint Worker] WARN  MessageDatabase                - Failed to load next journal location: null
2017-06-16 08:19:45,059 [eckpoint Worker] ERROR MessageDatabase                - Checkpoint failed
java.lang.OutOfMemoryError: GC overhead limit exceeded
2017-06-16 08:19:45,059 [eckpoint Worker] ERROR MessageDatabase                - Checkpoint failed
java.lang.OutOfMemoryError: GC overhead limit exceeded
--------------------------

Thanks.
Hidekazu
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AcativeMQ becomes slow suddenly in 60 minutes after starting a performance test

christopher.l.shannon
How much memory have you allocated for the broker?  You might need to
simply allocate more memory or tune your memory limits.  See
http://activemq.apache.org/javalangoutofmemory.html

On Fri, Jun 30, 2017 at 7:10 AM, Hidekazu <[hidden email]> wrote:

> Hi,
>
> We did a performance test for ActiveMQ and came across a problem.
> I hope someone help us understand what was wrong with my ActiveMQ.
> Let me explain it here.
>
> <problem>
> Transaction time became long suddenly from 50 millsec to 60 sec in about 60
> minutes
> after starting a performance test.
> ActiveMQ kept cpu and memory usage in a normal condition during the
> performance test.
> However, iowait rose 10% in 60 minutes.
>
>
> <environment>
> Our ActiveMQ version 5.14.3
>
> We emulated 1,000 IoT clients and sent data to ActiveMQ at one minute
> interval.
> Payload size sent by IoT client was about 2 Kbytes.
> KahaDB size is enough big. (100GB)
>
>
> <KahaDB log>
> There was an error and out of memory in kahaDB log.
>
> --------------------------
> 2017-06-16 04:39:19,615 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 04:39:19,615 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 04:45:08,527 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 04:45:08,527 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 04:57:16,863 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 04:57:16,863 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 05:10:08,083 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 05:10:08,083 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 05:22:49,960 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 05:22:49,960 [eckpoint Worker] WARN  MessageDatabase
> - Failed to load next journal location: null
> 2017-06-16 08:19:45,059 [eckpoint Worker] ERROR MessageDatabase
> - Checkpoint failed
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2017-06-16 08:19:45,059 [eckpoint Worker] ERROR MessageDatabase
> - Checkpoint failed
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> --------------------------
>
> Thanks.
> Hidekazu
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/AcativeMQ-becomes-slow-suddenly-in-60-minutes-
> after-starting-a-performance-test-tp4728103.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AcativeMQ becomes slow suddenly in 60 minutes after starting a performance test

Hidekazu
This post was updated on .
Dear christopher.l.shannon

Thanks for your quick reply.

I alocated 3GB memory for ActiveMQ.
This is our startup setting for heap allocation.

/usr/sbin/java -Xms3G -Xmx3G -XX:PermSize=256M -XX:MaxPermSize=256M

And this is the systemUsage setting in activemq.xml


<systemUsage>
  <systemUsage>
    <memoryUsage>
      <memoryUsage memoryUsage percentOfJvmHeap="70" />
    </memoryUsage>
    <storeUsage>
      <storeUsage limit="100 gb" />
    </storeUsage>
    <tempUsage>
      <tempUsage limit="50 gb" />
    </tempUsage>
  </systemUsage>
</systemUsage>


Could you tell me why you thought it was a memory problem?

Thanks!
Hidekazu
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AcativeMQ becomes slow suddenly in 60 minutes after starting a performance test

Tim Bain
Can you describe more about your test itself? Were your IoT processes
publishing to queues or topics? To a single one or to one per client? Was
anything consuming the messages? Were the messages persistent or
non-persistent? (If persistent, can you prove it by showing a reasonable
rate of growth on the KahaDB data files?)

Tim

On Jul 2, 2017 9:40 PM, "Hidekazu" <[hidden email]> wrote:

> Dear christopher.l.shannon
>
> Thanks for your quick reply.
>
> I alocated 3GB memory for ActiveMQ.
> This is our startup setting for heap allocation.
>
> /usr/sbin/java -Xms3G -Xmx3G -XX:PermSize=256M -XX:MaxPermSize=256M
>
> And this is the systemUsage setting in activemq.xml
>
>
> <systemUsage>
>   <systemUsage>
>     <memoryUsage>
>       <memoryUsage memoryUsage percentOfJvmHeap="70" />
>     </memoryUsage>
>     <storeUsage>
>       <storeUsage limit="100 gb" />
>     </storeUsage>
>     <tempUsage>
>       <tempUsage limit="50 gb" />
>     </tempUsage>
>   </systemUsage>
> </systemUsage>
>
>
> Thanks!
> Hidekazu
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/AcativeMQ-becomes-slow-suddenly-in-60-minutes-
> after-starting-a-performance-test-tp4728103p4728185.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AcativeMQ becomes slow suddenly in 60 minutes after starting a performance test

Hidekazu
Dear Tim

Thanks for helping me a lot.
Let me answer your questions.

> Were your IoT processes publishing to queues or topics?

Topics

> To a single one or to one per client?

To a single ActiveMQ broker.

> Was anything consuming the messages?

Yes, our subscriber consume the topics.

> Were the messages persistent or non-persistent?

Yes, persistent.

> (If persistent, can you prove it by showing a reasonable
> rate of growth on the KahaDB data files?)

The performance test was conducted on June 23th and I experienced the slow ActiveMQ issue on that day.

This is the current kahadb data file directory list. (not June 23th)
I didn't log the growth rate of Kahadb data during the performance test.
I'm sorry..

[emsuser@ematapl01 kahadb]$ ls -lh
合計 1.5G
-rw-r--r-- 1 root root  32M  5月 30 14:47 2017 db-1184.log
-rw-r--r-- 1 root root  32M  5月 30 14:57 2017 db-1185.log
-rw-r--r-- 1 root root  32M  5月 30 15:29 2017 db-1194.log
-rw-r--r-- 1 root root  32M  5月 30 16:14 2017 db-1196.log
-rw-r--r-- 1 root root  32M  6月 14 21:23 2017 db-3071.log
-rw-r--r-- 1 root root  32M  6月 15 17:25 2017 db-3204.log
-rw-r--r-- 1 root root  32M  6月 15 19:13 2017 db-3223.log
-rw-r--r-- 1 root root  32M  6月 16 00:02 2017 db-3227.log
-rw-r--r-- 1 root root  32M  6月 17 00:01 2017 db-3340.log
-rw-r--r-- 1 root root  32M  6月 17 05:57 2017 db-3397.log
-rw-r--r-- 1 root root  32M  6月 23 14:18 2017 db-4138.log
-rw-r--r-- 1 root root  32M  7月  4 08:59 2017 db-4704.log
-rw-r--r-- 1 root root  31M  7月  3 12:15 2017 db-4705.log
-rw-r--r-- 1 root root  24M  7月  3 12:20 2017 db-4706.log
-rw-r--r-- 1 root root  16M  7月  3 12:26 2017 db-4707.log
-rw-r--r-- 1 root root  23M  7月  3 12:31 2017 db-4708.log
-rw-r--r-- 1 root root  16M  7月  3 12:37 2017 db-4709.log
-rw-r--r-- 1 root root 8.4M  7月  3 12:42 2017 db-4710.log
-rw-r--r-- 1 root root  23M  7月  3 12:48 2017 db-4711.log
-rw-r--r-- 1 root root  21M  7月  3 12:53 2017 db-4712.log
-rw-r--r-- 1 root root  13M  7月  3 12:59 2017 db-4713.log
-rw-r--r-- 1 root root  15M  7月  3 13:04 2017 db-4714.log
-rw-r--r-- 1 root root  20M  7月  3 13:10 2017 db-4715.log
-rw-r--r-- 1 root root  13M  7月  3 13:16 2017 db-4716.log
-rw-r--r-- 1 root root  12M  7月  3 13:21 2017 db-4717.log
-rw-r--r-- 1 root root  12M  7月  3 13:27 2017 db-4718.log
-rw-r--r-- 1 root root  12M  7月  3 13:32 2017 db-4719.log
-rw-r--r-- 1 root root  18M  7月  3 13:38 2017 db-4720.log
-rw-r--r-- 1 root root  13M  7月  3 13:43 2017 db-4721.log
-rw-r--r-- 1 root root  12M  7月  3 13:49 2017 db-4722.log
-rw-r--r-- 1 root root  15M  7月  3 13:54 2017 db-4723.log
-rw-r--r-- 1 root root  16M  7月  3 14:00 2017 db-4724.log
-rw-r--r-- 1 root root  14M  7月  3 14:05 2017 db-4725.log
-rw-r--r-- 1 root root  15M  7月  3 14:11 2017 db-4726.log
-rw-r--r-- 1 root root  23M  7月  3 14:16 2017 db-4727.log
-rw-r--r-- 1 root root  21M  7月  3 14:22 2017 db-4728.log
-rw-r--r-- 1 root root 923K  7月  4 09:04 2017 db-4730.log
-rw-r--r-- 1 root root  32M  7月  7 19:07 2017 db-4732.log
-rw-r--r-- 1 root root 553M  7月  7 19:07 2017 db.data
-rw-r--r-- 1 root root  92M  3月 16 19:15 2017 db.data.bk
-rw-r--r-- 1 root root  23M  7月  7 19:07 2017 db.redo
-rw-r--r-- 1 root root    8  7月  3 12:09 2017 lock
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AcativeMQ becomes slow suddenly in 60 minutes after starting a performance test

Hidekazu
Hi

I faced the same problem with ActiveMQ 5.15.0.
Configurations and environments are almost the same with 5.14.3.

Hidekazu
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AcativeMQ becomes slow suddenly in 60 minutes after starting a performance test

Tim Bain
Your directory still was able to confirm that data is indeed being written
to the KahaDB store, so thank you for that. One thing I notice is that
you've got a large number of sequential files that haven't been deleted,
which implies that there are messages in those files that aren't being
consumed. Are you certain that during your performance test, all messages
published were being consumed? I'd recommend you try to confirm what you
believe to be true by viewing the JMX MBeans that represent your topics, to
make sure that the count of dequeued messages really is increasing at the
expected rate.

It would be useful to know whether the growth in memory happens gradually
over the full length of your performance run or happens in one or more
"spikes" of additional memory usage. I've found I get the most useful
memory/GC information out of JConsole; you can attach it to the broker when
you start the performance test, and then just leave it running and watch
what the graph on the memory tab does over time.

Also, did the "Failed to load next journal location" warnings disappear
with 5.15.0? https://issues.apache.org/jira/browse/AMQ-6432 appears to be
what you were hitting, and it should be fixed under 5.15.0. If it's gone in
5.15.0 but the OutOfMemoryError is still happening, then that implies that
the two are not related.

Tim

On Tue, Jul 11, 2017 at 1:27 AM, Hidekazu <[hidden email]> wrote:

> Hi
>
> I faced the same problem with ActiveMQ 5.15.0.
> Configurations and environments are almost the same with 5.14.3.
>
> Hidekazu
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/AcativeMQ-becomes-slow-suddenly-in-60-minutes-
> after-starting-a-performance-test-tp4728103p4728434.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Loading...