Clean-up of unreferenced un-referenced KahaDB journal log files files data-<id>.logwill occur every 30 seconds by default. If a data file is in-use it will not be cleaned up.
It contains a pending message for a destination or durable topic subscription
It contains an ack ACK for a message which is in an in-use data file - the ack ACK cannot be removed as a recovery would then mark the message for redelivery
The journal references a pending transaction
It is a journal file, and there may be a pending write to it
The trace level logging of the the org.apache.activemq.store.kahadb.MessageDatabaseclass provides insight into the cleanup process and will allow you to determine why a given data file is considered in-use and as a result, not a candidate for cleanup.
To debug, add the following (or similar) to your your log4j.propertiesfile (if needed):
Either restart AMQ ActiveMQ and let the cleanup process run (give it a minute or two for example) or alternatively apply this logging configuration to a running broker via JMX. The "The Broker" MBean exposes an operation called "called reloadLog4jProperties" in JMX that can be used to tell the broker to reload its log4j.properties. Often its enough to apply this logging configuration for 2-5 minutes and then analyze the broker's log file.
Examine the log file and look for cleanup of the data files. The process starts with the complete set of known data files and queries the index on a per destination basis to prune this list. Anything that remains is a candidate for cleanup. The trace logging gives the destination and the log file numbers that remain candidates for removal as it iterates through the index.
At some point you'll hit a destination and the number of data file ids will suddenly drop because that destination references them. It could be a DLQ or an offline durable subscriber. In any event, the logging will help you pinpoint the destinations that are hogging disk space.
We get one candidate,data-87.logfrom the existing set of journal data files [86, 87, 163, 164]. There is a current transaction using 164, destination (Queue named named E) '0\:E' has some messages in 163, destination '0:I' has messages in in 86and and 87is unreferencedun-referenced. In this case, there must be some long standing unacked un-acknowledged messages or a very slow consumer on destination '0:I'.
The '0:' prefix is shorthand for a queue, '1:' for a topic
refers to a topic named B.
Similar for non-persistent messages that are not stored in your configured KahaDB persistence adapter but get swapped to temp storage once they exceed the broker's configured configured memoryUsagelimit. A similar logging configuration can show details of the cleanup of temp storage.