Activemq 5.9 leveldb replication issue

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Activemq 5.9 leveldb replication issue

heimdull
I have been testing this setup with one of our clusters and I have found an issue where failover breaks one or two of our queues.

I have a setup where I'm running latest snapshot with Zookeeper and setup for replicated leveldb. When I run a activemq stop (using wrapper) my active master shutdown fine and my slave recognizes this and starts process of going active. When I can get back into my web console I see all queues like expected BUT there are usually one queue with 50 messages that are stuck, it has consumers but they are not consumed. When trying to view contents of queue with jmx or web console it just hangs and times-out. If I delete this queue its recreated and everything starts working again.

Only log info I see are these:

2013-05-26 07:09:38,850 | WARN  | Async error occurred: java.lang.IllegalArgumentException: The subscription does not exist: ID:tn11-27191-1368304518500-1:6:5:1 | org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker 38
java.lang.IllegalArgumentException: The subscription does not exist: ID


Here is my env:

INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:java.version=1.7.0_21
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:java.vendor=Oracle Corporation
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:java.home=/opt/jdk/jre
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:java.class.path=/opt/activemq/bin/wrapper.jar:/opt/activemq/bin/activemq.jar
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:java.library.path=/opt/activemq/bin/linux-x86-64/
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:java.io.tmpdir=/tmp
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:java.compiler=<NA>
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:os.name=Linux
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:os.arch=amd64
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:os.version=3.2.0-37-generic
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:user.name=activemq
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:user.home=/home/activemq
INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client environment:user.dir=/opt/activemq/bin/linux-x86-64

I tried this we the three latest Snapshots they all do the same.

(And in an unrelated error When starting latest 5.9-snapshot I get a message that amqp is not recognized.)
Reply | Threaded
Open this post in threaded view
|

Re: Activemq 5.9 leveldb replication issue

ceposta
Give me the exact steps to reproduce... How many queues, how many messages
sent to each before failover, etc etc... Exact steps.

On Sunday, May 26, 2013, heimdull wrote:

> I have been testing this setup with one of our clusters and I have found an
> issue where failover breaks one or two of our queues.
>
> I have a setup where I'm running latest snapshot with Zookeeper and setup
> for replicated leveldb. When I run a activemq stop (using wrapper) my
> active
> master shutdown fine and my slave recognizes this and starts process of
> going active. When I can get back into my web console I see all queues like
> expected BUT there are usually one queue with 50 messages that are stuck,
> it
> has consumers but they are not consumed. When trying to view contents of
> queue with jmx or web console it just hangs and times-out. If I delete this
> queue its recreated and everything starts working again.
>
> Only log info I see are these:
>
> 2013-05-26 07:09:38,850 | WARN  | Async error occurred:
> java.lang.IllegalArgumentException: The subscription does not exist:
> ID:tn11-27191-1368304518500-1:6:5:1 |
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO
> Worker
> 38
> java.lang.IllegalArgumentException: The subscription does not exist: ID
>
>
> Here is my env:
>
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:java.version=1.7.0_21
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:java.vendor=Oracle Corporation
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:java.home=/opt/jdk/jre
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
>
> environment:java.class.path=/opt/activemq/bin/wrapper.jar:/opt/activemq/bin/activemq.jar
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:java.library.path=/opt/activemq/bin/linux-x86-64/
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:java.io.tmpdir=/tmp
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:java.compiler=<NA>
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:os.name=Linux
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:os.arch=amd64
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:os.version=3.2.0-37-generic
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:user.name=activemq
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:user.home=/home/activemq
> INFO   | jvm 1    | 2013/05/26 07:09:34 |  INFO | Client
> environment:user.dir=/opt/activemq/bin/linux-x86-64
>
> I tried this we the three latest Snapshots they all do the same.
>
> (And in an unrelated error When starting latest 5.9-snapshot I get a
> message
> that amqp is not recognized.)
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>


--
*Christian Posta*
http://www.christianposta.com/blog
twitter: @christianposta
Reply | Threaded
Open this post in threaded view
|

Re: Activemq 5.9 leveldb replication issue

heimdull
This server has two queues and there was a few thousand messages sent when this happened. I can reproduce this everytime when doing a failover between these two servers. One thing I noticed today was when doing a failover (service activemq stop on master) was that on queue was stuck where "Messages Dequeued" is 0 but other queue is humming along so my fix is to delete that queue and when I do that I get a message that leveldb recovery is done in my logs and everything works fine again. (looks and sounds like there is a crash in leveldb with corruption)

I will see tomorrow if I can replicate this in a smaller scaled server so that I can pull better data.
Reply | Threaded
Open this post in threaded view
|

Re: Activemq 5.9 leveldb replication issue

heimdull
I started testing this on a setup where I manually generate messages and I'm not able replicate the issue. I think it only happens if there is moderate load when failing over.

Here is the exact message that shows in the setup where I can replicate the issue:

2013-05-28 10:42:56,977 | INFO  | DB recovered from failure. | org.apache.activemq.leveldb.LevelDBClient | ActiveMQ NIO Worker 93

This message shows up in my log after a failover when I delete the broken queue. After that message I have these:

2013-05-28 10:42:56,990 | WARN  | Async error occurred: java.lang.IllegalArgumentException: The subscription does not exist: ID:xx03-35870-1369757354540-1:3:5:1 | org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker 29
java.lang.IllegalArgumentException: The subscription does not exist: ID:xx03-35870-1369757354540-1:3:5:1
        at org.apache.activemq.broker.region.AbstractRegion.messagePull(AbstractRegion.java:432)
        at org.apache.activemq.broker.region.RegionBroker.messagePull(RegionBroker.java:468)
        at org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
Reply | Threaded
Open this post in threaded view
|

Re: Activemq 5.9 leveldb replication issue

chirino
If you got a 'DB recovered from failure.' message, then it should have
been preceded with an 'DB operation failed. (entering recovery mode)'
message.  Could you post that message?

On Tue, May 28, 2013 at 1:47 PM, heimdull <[hidden email]> wrote:

> I started testing this on a setup where I manually generate messages and I'm
> not able replicate the issue. I think it only happens if there is moderate
> load when failing over.
>
> Here is the exact message that shows in the setup where I can replicate the
> issue:
>
> 2013-05-28 10:42:56,977 | INFO  | DB recovered from failure. |
> org.apache.activemq.leveldb.LevelDBClient | ActiveMQ NIO Worker 93
>
> This message shows up in my log after a failover when I delete the broken
> queue. After that message I have these:
>
> 2013-05-28 10:42:56,990 | WARN  | Async error occurred:
> java.lang.IllegalArgumentException: The subscription does not exist:
> ID:xx03-35870-1369757354540-1:3:5:1 |
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker
> 29
> java.lang.IllegalArgumentException: The subscription does not exist:
> ID:xx03-35870-1369757354540-1:3:5:1
>         at
> org.apache.activemq.broker.region.AbstractRegion.messagePull(AbstractRegion.java:432)
>         at
> org.apache.activemq.broker.region.RegionBroker.messagePull(RegionBroker.java:468)
>         at
> org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495p4667560.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.



--
Hiram Chirino

Engineering | Red Hat, Inc.

[hidden email] | fusesource.com | redhat.com

skype: hiramchirino | twitter: @hiramchirino

blog: Hiram Chirino's Bit Mojo
Reply | Threaded
Open this post in threaded view
|

Re: Activemq 5.9 leveldb replication issue

heimdull
I tried this morning to replicate the issue on these two servers and was not able to do so. The only issue that I got was a few messages that got stuck in the queue with a consumer holding them hostage. either restarting that consumer or moving them to a different queue and back resolved that.

I did look at older errors that I had and found there recovery mode errors:

INFO   | jvm 1    | 2013/05/19 12:15:22 |  INFO | DB recovered from failure.
INFO   | jvm 1    | 2013/05/19 12:15:22 |  WARN | DB operation failed. (entering recovery mode)
INFO   | jvm 1    | 2013/05/19 12:15:22 | java.io.IOException: short record at position: 2467052 in file: /opt/activemq/data/0000000000000000.log, offset: 2467052
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.RecordLog$LogReader.read(RecordLog.scala:287)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.RecordLog$$anonfun$read$2.apply(RecordLog.scala:543)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.RecordLog$$anonfun$read$2.apply(RecordLog.scala:543)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.RecordLog$$anonfun$get_reader$1.apply(RecordLog.scala:532)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.RecordLog$$anonfun$get_reader$1.apply(RecordLog.scala:514)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at scala.Option.map(Option.scala:133)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.RecordLog.get_reader(RecordLog.scala:514)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.RecordLog.read(RecordLog.scala:543)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient.getMessage(LevelDBClient.scala:1148)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1098)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1095)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$11.apply(LevelDBClient.scala:1171)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$11.apply(LevelDBClient.scala:1170)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:311)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:313)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1170)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1170)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1170)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:896)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$retryUsingIndex$1.apply(LevelDBClient.scala:902)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.util.RetrySupport$.retry(RetrySupport.scala:38)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient.retry(LevelDBClient.scala:504)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient.retryUsingIndex(LevelDBClient.scala:902)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient.collectionCursor(LevelDBClient.scala:1169)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBClient.scala:1095)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager.scala:671)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.recoverNextMessages(LevelDBStore.scala:699)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:106)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:268)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.hasNext(AbstractStoreCursor.java:147)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext(StoreQueueCursor.java:131)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1933)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2160)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1634)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.Queue.wakeup(Queue.java:1862)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.Queue.addSubscription(Queue.java:497)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.AbstractRegion.addConsumer(AbstractRegion.java:323)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.region.RegionBroker.addConsumer(RegionBroker.java:390)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.jmx.ManagedRegionBroker.addConsumer(ManagedRegionBroker.java:229)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:97)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.advisory.AdvisoryBroker.addConsumer(AdvisoryBroker.java:76)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:97)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:97)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.MutableBrokerFilter.addConsumer(MutableBrokerFilter.java:102)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.TransportConnection.processAddConsumer(TransportConnection.java:619)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.command.ConsumerInfo.visit(ConsumerInfo.java:332)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:329)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:184)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:288)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
INFO   | jvm 1    | 2013/05/19 12:15:22 |       at java.lang.Thread.run(Thread.java:722)





INFO   | jvm 1    | 2013/05/19 12:16:58 |  INFO | DB recovered from failure.
INFO   | jvm 1    | 2013/05/19 12:16:58 |  WARN | DB operation failed. (entering recovery mode)
INFO   | jvm 1    | 2013/05/19 12:16:58 | java.io.IOException: short record at position: 2467052 in file: /opt/activemq/data/0000000000000000.log, offset: 2467052
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.RecordLog$LogReader.read(RecordLog.scala:287)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.RecordLog$$anonfun$read$2.apply(RecordLog.scala:543)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.RecordLog$$anonfun$read$2.apply(RecordLog.scala:543)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.RecordLog$$anonfun$get_reader$1.apply(RecordLog.scala:532)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.RecordLog$$anonfun$get_reader$1.apply(RecordLog.scala:514)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at scala.Option.map(Option.scala:133)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.RecordLog.get_reader(RecordLog.scala:514)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.RecordLog.read(RecordLog.scala:543)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient.getMessage(LevelDBClient.scala:1148)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1098)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1095)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$11.apply(LevelDBClient.scala:1171)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$11.apply(LevelDBClient.scala:1170)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:311)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:313)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1170)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1170)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1170)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:896)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient$$anonfun$retryUsingIndex$1.apply(LevelDBClient.scala:902)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.util.RetrySupport$.retry(RetrySupport.scala:38)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient.retry(LevelDBClient.scala:504)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient.retryUsingIndex(LevelDBClient.scala:902)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient.collectionCursor(LevelDBClient.scala:1169)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBClient.scala:1095)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager.scala:671)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.recoverNextMessages(LevelDBStore.scala:699)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:106)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:268)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.hasNext(AbstractStoreCursor.java:147)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext(StoreQueueCursor.java:131)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1933)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2160)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1634)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.Queue.wakeup(Queue.java:1862)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.Queue.addSubscription(Queue.java:497)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.AbstractRegion.addConsumer(AbstractRegion.java:323)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.region.RegionBroker.addConsumer(RegionBroker.java:390)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.jmx.ManagedRegionBroker.addConsumer(ManagedRegionBroker.java:229)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:97)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.advisory.AdvisoryBroker.addConsumer(AdvisoryBroker.java:76)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:97)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:97)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.MutableBrokerFilter.addConsumer(MutableBrokerFilter.java:102)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.TransportConnection.processAddConsumer(TransportConnection.java:619)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.command.ConsumerInfo.visit(ConsumerInfo.java:332)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:329)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:184)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:288)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
INFO   | jvm 1    | 2013/05/19 12:16:58 |       at java.lang.Thread.run(Thread.java:722)












-freddy

On May 30, 2013, at 2:05 PM, chirino [via ActiveMQ] wrote:

If you got a 'DB recovered from failure.' message, then it should have
been preceded with an 'DB operation failed. (entering recovery mode)'
message.  Could you post that message?

On Tue, May 28, 2013 at 1:47 PM, heimdull <<a href="x-msg://232/user/SendEmail.jtp?type=node&amp;node=4667674&amp;i=0" target="_top" rel="nofollow" link="external">[hidden email]> wrote:

> I started testing this on a setup where I manually generate messages and I'm
> not able replicate the issue. I think it only happens if there is moderate
> load when failing over.
>
> Here is the exact message that shows in the setup where I can replicate the
> issue:
>
> 2013-05-28 10:42:56,977 | INFO  | DB recovered from failure. |
> org.apache.activemq.leveldb.LevelDBClient | ActiveMQ NIO Worker 93
>
> This message shows up in my log after a failover when I delete the broken
> queue. After that message I have these:
>
> 2013-05-28 10:42:56,990 | WARN  | Async error occurred:
> java.lang.IllegalArgumentException: The subscription does not exist:
> ID:xx03-35870-1369757354540-1:3:5:1 |
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker
> 29
> java.lang.IllegalArgumentException: The subscription does not exist:
> ID:xx03-35870-1369757354540-1:3:5:1
>         at
> org.apache.activemq.broker.region.AbstractRegion.messagePull(AbstractRegion.java:432)
>         at
> org.apache.activemq.broker.region.RegionBroker.messagePull(RegionBroker.java:468)
>         at
> org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495p4667560.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.


--
Hiram Chirino

Engineering | Red Hat, Inc.

<a href="x-msg://232/user/SendEmail.jtp?type=node&amp;node=4667674&amp;i=1" target="_top" rel="nofollow" link="external">[hidden email] | fusesource.com | redhat.com

skype: hiramchirino | twitter: @hiramchirino

blog: Hiram Chirino's Bit Mojo



If you reply to this email, your message will be added to the discussion below:
http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495p4667674.html
To unsubscribe from Activemq 5.9 leveldb replication issue, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Activemq 5.9 leveldb replication issue

chirino
Hi Freddy,

That 'short record at position' is odd.  Basically it means it tired
to read a message from the log file and the read call read fewer bytes
than what we requested.  I guess that's valid for an OS to do and we
just need to issue an additional read to load the remaining bytes.
I've just committed a change that should help with that:
http://is.gd/0nMmLH

So try out a new snapshot and keep an eye for that exception and let
me know you see it happen on your system again.

Thanks!


On Fri, May 31, 2013 at 1:28 PM, heimdull <[hidden email]> wrote:
> I tried this morning to replicate the issue on these two servers and was not able to do so. The only issue that I got was a few messages that got stuck in the queue with a consumer holding them hostage. either restarting that consumer or moving them to a different queue and back resolved that.
>
> I did look at older errors that I had and found there recovery mode errors:
>
> INFO   | jvm 1    | 2013/05/19 12:15:22 |  INFO | DB recovered from failure.
> INFO   | jvm 1    | 2013/05/19 12:15:22 |  WARN | DB operation failed. (entering recovery mode)
> INFO   | jvm 1    | 2013/05/19 12:15:22 | java.io.IOException: short record at position: 2467052 in file: /opt/activemq/data/0000000000000000.log, offset: 2467052
> INFO   | jvm 1    | 2013/05/19 12:15:22 |       at org.apache.activemq.leveldb.RecordLog$LogReader.read(RecordLog.scala:287)


--
Hiram Chirino

Engineering | Red Hat, Inc.

[hidden email] | fusesource.com | redhat.com

skype: hiramchirino | twitter: @hiramchirino

blog: Hiram Chirino's Bit Mojo
Reply | Threaded
Open this post in threaded view
|

Re: Activemq 5.9 leveldb replication issue

heimdull
great! I have tested latest snapshot and have not seen this error when I failover. I do however see that I get a handful of messages stuck in a queue when I do and I need to move them out and back in to the queue before they get consumed.

Could this have something todo with that:

2013-06-03 23:50:53,523 | WARN  | Async error occurred: java.lang.IllegalArgumentException: The subscription does not exist: ID:tn11-6552-1369955706360-1:2:1:1 | org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker 14
java.lang.IllegalArgumentException: The subscription does not exist: ID:tn11-6552-1369955706360-1:2:1:1
        at org.apache.activemq.broker.region.AbstractRegion.messagePull(AbstractRegion.java:432)
        at org.apache.activemq.broker.region.RegionBroker.messagePull(RegionBroker.java:468)
        at org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
        at org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
        at org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
        at org.apache.activemq.broker.MutableBrokerFilter.messagePull(MutableBrokerFilter.java:289)
        at org.apache.activemq.broker.TransportConnection.processMessagePull(TransportConnection.java:515)
        at org.apache.activemq.command.MessagePull.visit(MessagePull.java:43)
        at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:329)
        at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:184)
        at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
        at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
        at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:288)
        at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
        at org.apache.activemq.transport.nio.NIOTransport.serviceRead(NIOTransport.java:138)
        at org.apache.activemq.transport.nio.NIOTransport$1.onSelect(NIOTransport.java:69)
        at org.apache.activemq.transport.nio.SelectorSelection.onSelect(SelectorSelection.java:94)
        at org.apache.activemq.transport.nio.SelectorWorker$1.run(SelectorWorker.java:119)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

If I look at my web console (Active Consumers list) I see that 7 consumers have 1=Enqueues, 0=Dequeues, 1=Dispatched, 1=Dispatched Queue which happen to be the same number of stuck messages other consumers have even numbers.

We are also using activemq-5.7 jar for our consumers could that be a problem?
 
Reply | Threaded
Open this post in threaded view
|

Re: Activemq 5.9 leveldb replication issue

linuraj
Hi, We have the same issue with activemq version 5.13.1. After a failover, we see some messages stuck in the queue. We see a valid number as queue-size; however when we browse the queue via hawtio we will not see those messages. Also, it is the same number of messages that we miss after a failover. Do we know the cause? Do we know the fix? Really appreciate your help. Thanks Linu Raj.