AMQ 5.13.2 : Kaha DB logs cleanup issue

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

AMQ 5.13.2 : Kaha DB logs cleanup issue

Shobhana
We use AMQ to exchange MQTT messages. I observed that the log files in Kaha DB don't get deleted even after many days.

I used the steps mentioned in http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html to locate which destination is containing unacked messages or slow consumer. Following is the trace log output :

2016-05-17 18:05:32,077 | TRACE | Last update: 7:1109948, full gc candidates set: [1, 2, 3, 4, 5, 6, 7] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,078 | TRACE | gc candidates after producerSequenceIdTrackerLocation:7, [1, 2, 3, 4, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,078 | TRACE | gc candidates after ackMessageFileMapLocation:7, [1, 2, 3, 4, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,079 | TRACE | gc candidates after tx range:[null, null], [1, 2, 3, 4, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,079 | TRACE | gc candidates after dest:1:riderride.chat.200, [1, 3, 4, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,081 | TRACE | gc candidates after dest:1:riderride.chat.203, [1, 3, 4, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,081 | TRACE | gc candidates after dest:1:riderride.chat.206, [1, 3, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,083 | TRACE | gc candidates after dest:1:riderride.chat.209, [1, 3, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,083 | TRACE | gc candidates after dest:1:passengerride.user.1234567890, [1, 3, 5] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,083 | TRACE | gc candidates after dest:1:passengerride.user.1234567891, [1, 3, 5] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,085 | TRACE | gc candidates after dest:1:passengerride.status.160, [1, 3, 5] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,089 | TRACE | gc candidates after dest:1:riderride.chat.218, [1, 3, 5] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,090 | TRACE | gc candidates after dest:1:invitationStatus.1234567891, [1, 3, 5] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,090 | TRACE | gc candidates after dest:1:chat.9206705762, [3, 5] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,090 | TRACE | gc candidates after dest:1:invitationStatus.1234567890, [3] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,091 | TRACE | gc candidates after dest:1:riderride.chat.220, [3] | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker

As per description given in the link mentioned above, there must be some unacked messages or slow consumers on topic riderride.chat.209 which is preventing log file # 6 to be GCed. However, when I check in the admin console to see how many messages were exchanged on this topic, I see that the enqueued and dequeued messages are 0. So there is no chance for any unacked messages. There is 1 inactive subscriber though. Will this inactive subscriber prevent the GCing of this log file?

Same issue is observed for all other log files too. Any idea what's going wrong here?
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Tim Bain
As far as I know, the stats cover only the time since the broker was last
restarted, so seeing zeros is no guarantee that there are no pending
messages in the persistence store.  I would use a JMX viewer such as
JConsole to look at what the broker thinks is the current status of the
consumer(s) on the topics in question.

Are the subscribers to your topics using durable or non-durable
subscriptions?  I assume durable...

I don't understand the logic that led you to conclude that
riderride.chat.209 is preventing log file #6 from being GC'ed.  As I read
your logs, what I see is that:

   - riderride.chat.200 prevents #2 from being GC'ed
   - riderride.chat.206 prevents #4 from being GC'ed
   - passengerride.user.1234567890 prevents #6 from being GC'ed
   - chat.9206705762 prevents #1 from being GC'ed
   - invitationStatus.1234567890 prevents #5 from being GC'ed

Log #6 is blocked by passengerride.user.1234567890, and riderride.chat.209
doesn't block anything.  What am I missing?

Tim

On May 17, 2016 7:11 AM, "Shobhana" <[hidden email]> wrote:

> We use AMQ to exchange MQTT messages. I observed that the log files in Kaha
> DB don't get deleted even after many days.
>
> I used the steps mentioned in
>
> http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
> <
> http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
> >
> to locate which destination is containing unacked messages or slow
> consumer.
> Following is the trace log output :
>
> 2016-05-17 18:05:32,077 | TRACE | Last update: 7:1109948, full gc
> candidates
> set: [1, 2, 3, 4, 5, 6, 7] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,078 | TRACE | gc candidates after
> producerSequenceIdTrackerLocation:7, [1, 2, 3, 4, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,078 | TRACE | gc candidates after
> ackMessageFileMapLocation:7, [1, 2, 3, 4, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,079 | TRACE | gc candidates after tx range:[null,
> null],
> [1, 2, 3, 4, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase |
> ActiveMQ Journal Checkpoint Worker
> 2016-05-17 18:05:32,079 | TRACE | gc candidates after
> dest:1:riderride.chat.200, [1, 3, 4, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,081 | TRACE | gc candidates after
> dest:1:riderride.chat.203, [1, 3, 4, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,081 | TRACE | gc candidates after
> dest:1:riderride.chat.206, [1, 3, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,083 | TRACE | gc candidates after
> dest:1:riderride.chat.209, [1, 3, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,083 | TRACE | gc candidates after
> dest:1:passengerride.user.1234567890, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,083 | TRACE | gc candidates after
> dest:1:passengerride.user.1234567891, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,085 | TRACE | gc candidates after
> dest:1:passengerride.status.160, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,089 | TRACE | gc candidates after
> dest:1:riderride.chat.218, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,090 | TRACE | gc candidates after
> dest:1:invitationStatus.1234567891, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,090 | TRACE | gc candidates after
> dest:1:chat.9206705762, [3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,090 | TRACE | gc candidates after
> dest:1:invitationStatus.1234567890, [3] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,091 | TRACE | gc candidates after
> dest:1:riderride.chat.220, [3] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
>
> As per description given in the link mentioned above, there must be some
> unacked messages or slow consumers on topic riderride.chat.209 which is
> preventing log file # 6 to be GCed. However, when I check in the admin
> console to see how many messages were exchanged on this topic, I see that
> the enqueued and dequeued messages are 0. So there is no chance for any
> unacked messages. There is 1 inactive subscriber though. Will this inactive
> subscriber prevent the GCing of this log file?
>
> Same issue is observed for all other log files too. Any idea what's going
> wrong here?
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Shobhana
This post was updated on .
Thanks Tim, I did not know that the stats are reset when the broker is restarted. I tried to use JConsole to view the current status of consumers. I had a hard time getting JConsole to connect to my broker since my broker runs on an EC2 instance. Can I use the Jolokia Rest APIs instead? Do these APIs give cumulative stats or only from the time the broker is restarted?

Yes, all topics have durable subscribers.

I used the logic mentioned in the example given at http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html to identify the topic that prevents log file from being GCed. Even I was confused with the example. I expected "dest:0:J" to be holding messages in #86; but since the example said "dest:0:I", I thought the trace log was meant for "before" and not "after". Maybe this example can be corrected in the AMQ link.
However, since I was unsure whether the trace log was wrong or the example was wrong, I cross-checked for any pending messages in both riderride.chat.209 and passengerride.user.1234567890 and found that both were 0. Hence I raised this question.
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Tim Bain
You're right, your interpretation is consistent with the documented
example, whereas mine is consistent with the text of the log line.  In that
situation, I always assume the documentation is wrong, but that's just an
assumption.  If you or someone else has time to look at the algorithm and
confirm how it actually works, we can update the wiki.  If not, I'll get to
it when I have time.  In the meantime, you should investigate the
conclusions from both algorithms, since you can't be sure which is correct.

I'd be interested to hear what you see via JMX.

Also every durable subscriber on a topic has a subscription message that
can't be GC'ed, so you're never going to get to zero files.  I think I saw
a recent JIRA enhancement to allow those messages to be replayed (into the
newest file) when there are no messages on the topic.  You might want to
search for that enhancement and confirm whether it's in the version you're
using, and upgrade if not.

Tim
On May 17, 2016 9:38 PM, "Shobhana" <[hidden email]> wrote:

> Thanks Tim, I did not know that the stats are reset when the broker is
> restarted. I'll check how to use JConsole to view the current status of
> consumers.
>
> Yes, all topics have durable subscribers.
>
> I used the logic mentioned in the example given at
>
> http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
> <
> http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
> >
> to identify the topic that prevents log file from being GCed. Even I was
> confused with the example. I expected "dest:0:J" to be holding messages in
> #86; but since the example said "dest:0:I", I thought the trace log was
> meant for "before" and not "after". Maybe this example can be corrected in
> the AMQ link.
> However, since I was unsure whether the trace log was wrong or the example
> was wrong, I cross-checked for any pending messages in both
> riderride.chat.209 and passengerride.user.1234567890 and found that both
> were 0. Hence I raised this question.
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712083.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Shobhana
Sure, I'll check when I have some time.

Btw, I tried to use JConsole to view the current status of consumers. I tried hard getting JConsole to connect to my broker that runs on an EC2 instance, but couldn't succeed. Can I use the Jolokia Rest APIs instead? Do these APIs give same result as JMX viewer (cumulative stats) or only from the time the broker is restarted?

Are you talking about AMQ-6203? Currently we use ActiveMQ 5.13.2 and still see this problem. If the enhancement that you are talking about is AMQ-6203, we will upgrade to 5.13.3 version in which this is implemented.
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Shobhana
In reply to this post by Tim Bain
When checked using the Jolokia Rest APIs, I get same result ... enqueued message count on these topics show 0; so I guess the Jolokia Rest APIs also give only stats since the broker started last.
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Shobhana
In reply to this post by Tim Bain
Tim, as you mentioned, I could see one message in many KahaDB log files. I couldn't get JConsole to work with the broker installed on EC2 and Jolokia Rest APIs didn't provide me these stats either. So I just tested by deleting some older KahaDB log files (after taking backup) and restarted the broker by configuring ignoreMissingJournalfiles="true". When the broker started, I could see an INFO message indicating "X" no of messages are dropped from the index where "X" was equal to the no of older files I deleted. So I think each file contained just one message.

I'll now test by upgrading to 5.13.3 version. If I upgrade the broker, should I upgrade the jars on my client side as well? Or is it optional? I think I should upgrade.
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Tim Bain
We try not to break backwards compatibility between versions, but since no
one formally tests cross-version compatibility, it's not guaranteed.   But
between 5.13.2 and 5.13.3, we should not have knowingly made a breaking
change.

Why are you trying 5.13.3?  Is there a bug in the release notes that you
think changes this behavior?  If not, why bother?
On May 22, 2016 11:26 AM, "Shobhana" <[hidden email]> wrote:

Tim, as you mentioned, I could see one message in many KahaDB log files. I
couldn't get JConsole to work with the broker installed on EC2 and Jolokia
Rest APIs didn't provide me these stats either. So I just tested by deleting
some older KahaDB log files (after taking backup) and restarted the broker
by configuring ignoreMissingJournalfiles="true". When the broker started, I
could see an INFO message indicating "X" no of messages are dropped from the
index where "X" was equal to the no of older files I deleted. So I think
each file contained just one message.

I'll now test by upgrading to 5.13.3 version. If I upgrade the broker,
should I upgrade the jars on my client side as well? Or is it optional? I
think I should upgrade.



--
View this message in context:
http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712250.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Tim Bain
Sorry, my brain's not working.  Definitely try 5.13.3 to see if the fix for
the bug I referenced and you found (yes, that's the one) allows deletion of
the old files.
On May 23, 2016 6:54 AM, "Tim Bain" <[hidden email]> wrote:

> We try not to break backwards compatibility between versions, but since no
> one formally tests cross-version compatibility, it's not guaranteed.   But
> between 5.13.2 and 5.13.3, we should not have knowingly made a breaking
> change.
>
> Why are you trying 5.13.3?  Is there a bug in the release notes that you
> think changes this behavior?  If not, why bother?
> On May 22, 2016 11:26 AM, "Shobhana" <[hidden email]> wrote:
>
> Tim, as you mentioned, I could see one message in many KahaDB log files. I
> couldn't get JConsole to work with the broker installed on EC2 and Jolokia
> Rest APIs didn't provide me these stats either. So I just tested by
> deleting
> some older KahaDB log files (after taking backup) and restarted the broker
> by configuring ignoreMissingJournalfiles="true". When the broker started, I
> could see an INFO message indicating "X" no of messages are dropped from
> the
> index where "X" was equal to the no of older files I deleted. So I think
> each file contained just one message.
>
> I'll now test by upgrading to 5.13.3 version. If I upgrade the broker,
> should I upgrade the jars on my client side as well? Or is it optional? I
> think I should upgrade.
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712250.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

RuralHunter
In reply to this post by Shobhana
based on your log, I think it should be passengerride.user.1234567890 which prevents the deletion of #6
Reply | Threaded
Open this post in threaded view
|

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Shobhana
In reply to this post by Tim Bain
Hi Tim,

I tried 5.13.3 and all further versions till 5.14.0, but this problem is still not gone. I still see a lot of Kaha DB log files which consume huge amounts of disk space and eventually the broker stops functioning!

I made some more changes to check if I could get rid of this problem :
a) I enabled following plug-in to set expiry of 1 day for every message (we use only MQTT messages) :
<plugins>
  <discardingDLQBrokerPlugin dropAll="true" dropTemporaryTopics="true" dropTemporaryQueues="true" />
        <timeStampingBrokerPlugin ttlCeiling="86400000" zeroExpirationOverride="86400000"/>
</plugins>

b) I enabled DLQ to drop all expired messages for all topics and queues :
    <deadLetterStrategy>
          <sharedDeadLetterStrategy processExpired="false" />
    </deadLetterStrategy>

<discardingDLQBrokerPlugin dropAll="true" dropTemporaryTopics="true" dropTemporaryQueues="true" />

c) I enabled offline durable subscribers to timeout after 1 day :
offlineDurableSubscriberTimeout="86400000" offlineDurableSubscriberTaskSchedule="3600000"

After these changes, I could see a few log files got deleted, but majority of the log files remain in the Kahadb folder which consume lot of disk space.

I observed this in our production server where I cannot enable trace or JMX. Is there any other way to identify what is causing this issue?

TIA,
Shobhana