LevelDB exception occurs). Are messages transferring successfully to amq5,
> I believe I may have found a bug here. However, this could also be a
> mis-configuration on my side.
>
> Before I go into a detailed description of my observation, here my setup:
>
> I have setup the following system in a virtual environment:
> I have 3 zookeeper nodes.
> I have 6 ActiveMQ broker using 5.10.0
> All nodes (ZK, AMQ) are running on CentOS 6.5 64bit with the latest
> OpenJDK.
> Three broker form an active/passive cluster using replicated LevelDB store.
> I have installed native LevelDB 1,7.0 accessing it via the JNDI driver.
> The two cluster are forming a network of broker.
> The "networkConnectors" are defined in the activemq.xml files in only one
> cluster as duplex connections.
>
> Here is my test case and the situation:
> Let's name the six broker amq1 - amq6. So, the first active/passive cluster
> is amq1, amq2, and amq3 with amq1 active and the other two passive. The
> second cluster consists of amq4, amq5, and amq6 with amq4 as the active.
> I have one producer connecting to amq1 publishing messages to a
> VirtualTopic
> "VirtualTopic.Test" and a consumer connecting to amq4 reading those
> messages
> from a corresponding queue "Consumer.A.VirtualTopic.Test".
> In my test, I am sending 100,000 text messages with a body consisting of
> 1024 random characters to the VirtualTopic at maximum speed.
> However, after about 25,000 to 27,000 messages, the consumer on amq4 times
> out after about 10 seconds not receiving any more messages although the
> producer has already send all 100,000 messages to amq1.
> When looking at the log file of amq4, I am seeing the following messages:
> ....
> 2014-10-16 12:53:12,556 | INFO | Started responder end of duplex bridge
> link2@ID:uromahn-amq1-9110-48269-1413463991773-0:1 |
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport:
> tcp:///xx.xx.xx.xx:58959@61616
> 2014-10-16 12:53:12,559 | INFO | Started responder end of duplex bridge
> link1@ID:uromahn-amq1-9110-48269-1413463991773-0:1 |
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport:
> tcp:///xx.xx.xx.xx:58958@61616
> 2014-10-16 12:53:12,591 | INFO | Network connection between
> vm://brokergrp2#2 and tcp:///xx.xx.xx.xx:58958@61616 (brokergrp1) has been
> established. | org.apache.activemq.network.DemandForwardingBridgeSupport |
> triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=tcp:///xx.xx.xx.xx:58958@61616, localBroker=
> vm://brokergrp2#2
> 2014-10-16 12:53:12,591 | INFO | Network connection between
> vm://brokergrp2#0 and tcp:///10.64.253.198:58959@61616 (brokergrp1) has
> been
> established. | org.apache.activemq.network.DemandForwardingBridgeSupport |
> triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=tcp:///xx.xx.xx.xx:58959@61616, localBroker=
> vm://brokergrp2#0
> 2014-10-16 13:00:10,470 | INFO | Client session timed out, have not heard
> from server in 4071ms for sessionid 0x14918fd40f00007, closing socket
> connection and attempting reconnect | org.apache.zookeeper.ClientCnxn |
> main-SendThread(uromahn-zk1-9208:2181)
> 2014-10-16 13:00:10,575 | INFO | Demoted to slave |
> org.apache.activemq.leveldb.replicated.MasterElector | ZooKeeper state
> change dispatcher thread
> 2014-10-16 13:00:10,582 | INFO | Apache ActiveMQ 5.10.0 (brokergrp2,
> ID:uromahn-amq4-9175-46383-1413463934439-0:1) is shutting down |
> org.apache.activemq.broker.BrokerService | ActiveMQ
> BrokerService[brokergrp2] Task-8
> 2014-10-16 13:00:10,594 | WARN | Transport Connection to:
> tcp://zz.zz.zz.zz:34737 failed: java.io.IOException: Unexpected error
> occured: org.apache.activemq.broker.BrokerStoppedException: Broker
> BrokerService[brokergrp2] is being stopped |
> org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ
> Transport: tcp:///zz.zz.zz.zz:34737@61616
> ....
>
> After that, the log file contains a large number of messages dumped for
> which the broker could not send the acknowledgement back to amq1.
>
> Looking at the log file of the newly promoted master amq5 I see the
> following warning:
> ....
> 2014-10-16 13:00:18,422 | INFO | Network connection between
> vm://brokergrp2#0 and tcp:///yy.yy.yy.yy:58256@61616 (brokergrp1) has been
> established. | org.apache.activemq.network.DemandForwardingBridgeSupport |
> triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=tcp:///yy.yy.yy.yy:58256@61616, localBroker=
> vm://brokergrp2#0
> 2014-10-16 13:32:20,285 | WARN | Unexpected session error:
> java.io.IOException: Connection reset by peer |
> org.apache.activemq.leveldb.replicated.MasterLevelDBStore |
> hawtdispatch-DEFAULT-2
> 2014-10-16 13:32:20,286 | INFO | Slave has disconnected:
> e21da7b3-65a4-4756-a20b-bd77f8eb7607 |
> org.apache.activemq.leveldb.replicated.MasterLevelDBStore |
> hawtdispatch-DEFAULT-2
> ....
>
> After that, there are no messages being transmitted from amq1 to the second
> cluster in the network (neither amq4 nor amq5). This can be seen when
> looking at the corresponding tab of the web console of amq1. Also, when I
> let all broker running, I am getting a message on amq4 that the broker is
> waiting for the replicated store to synchronize one record. (or something
> to
> this extend - unfortunately I deleted the log already. :( ).
>
> So, here is what appears to happen:
> somehow Zookeeper things that amq4 is "dead" and then demotes it to slave
> in
> the middle of the message transfer. This could be because amq4 is too busy
> to respond to heatbeat requests. However, something goes wrong in the
> LevelDB replication and the new master amq5 can't pick up. This seem to
> lead
> to a "dead-lock" situation causing the client to time-out.
> NOTE: all clients are using the failover transport to connect to the
> cluster. The connector between the two cluster is also defined using the
> failover transport.
> Another indication of something "bad" is the fact that the client reports
> an
> exception that the connection to amq4 was reset and it is trying to
> reconnect. However, that connection does not seem to happen within 10
> seconds because my message listener has an internal timeout of 10 seconds.
> I have double and triple-checked the configuration and everything looks ok.
> Also, this doesn't happen in case both producer and consumer connect to the
> same cluster, e.g. amq1 or amq4. So, the individual cluster seem to work.
> The issue only seem to appear when transmitting messages between clusters
> in
> a network of broker.
>
> Any idea if this is a bug or just a SUE (stupid user error). :)
>
> -Uli
>
>
>
> --
> View this message in context:
>
http://activemq.2283324.n4.nabble.com/Potential-Bug-in-Master-Slave-with-Replicated-LevelDB-Store-tp4686450.html> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>