Potential Bug in Master-Slave with Replicated LevelDB Store

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Potential Bug in Master-Slave with Replicated LevelDB Store

uromahn
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

Tim Bain
In your broker-to-broker networkConnectors, are you using maxReconnects=0
as an argument to the failover URI?  It wouldn't explain why amq4 got
demoted, but it could explain why messages aren't transferring to amq5
instead.

You say you've got duplex connections between the clusters; which cluster
is the one that establishes them via a networkConnector?  And do you see
the same behavior if you put producers on cluster2 and consumers on
cluster1?

Also, looking at your logs it's not clear what happens between 13:00:48
(when amq5 becomes the master) and 13:32:20 (30 minutes later, when the
LevelDB exception occurs).  Are messages transferring successfully to amq5,
or is it sitting idle?

On Thu, Oct 16, 2014 at 8:42 AM, uromahn <[hidden email]> wrote:

> I believe I may have found a bug here. However, this could also be a
> mis-configuration on my side.
>
> Before I go into a detailed description of my observation, here my setup:
>
> I have setup the following system in a virtual environment:
> I have 3 zookeeper nodes.
> I have 6 ActiveMQ broker using 5.10.0
> All nodes (ZK, AMQ) are running on CentOS 6.5 64bit with the latest
> OpenJDK.
> Three broker form an active/passive cluster using replicated LevelDB store.
> I have installed native LevelDB 1,7.0 accessing it via the JNDI driver.
> The two cluster are forming a network of broker.
> The "networkConnectors" are defined in the activemq.xml files in only one
> cluster as duplex connections.
>
> Here is my test case and the situation:
> Let's name the six broker amq1 - amq6. So, the first active/passive cluster
> is amq1, amq2, and amq3 with amq1 active and the other two passive. The
> second cluster consists of amq4, amq5, and amq6 with amq4 as the active.
> I have one producer connecting to amq1 publishing messages to a
> VirtualTopic
> "VirtualTopic.Test" and a consumer connecting to amq4 reading those
> messages
> from a corresponding queue "Consumer.A.VirtualTopic.Test".
> In my test, I am sending 100,000 text messages with a body consisting of
> 1024 random characters to the VirtualTopic at maximum speed.
> However, after about 25,000 to 27,000 messages, the consumer on amq4 times
> out after about 10 seconds not receiving any more messages although the
> producer has already send all 100,000 messages to amq1.
> When looking at the log file of amq4, I am seeing the following messages:
> ....
> 2014-10-16 12:53:12,556 | INFO  | Started responder end of duplex bridge
> link2@ID:uromahn-amq1-9110-48269-1413463991773-0:1 |
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport:
> tcp:///xx.xx.xx.xx:58959@61616
> 2014-10-16 12:53:12,559 | INFO  | Started responder end of duplex bridge
> link1@ID:uromahn-amq1-9110-48269-1413463991773-0:1 |
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport:
> tcp:///xx.xx.xx.xx:58958@61616
> 2014-10-16 12:53:12,591 | INFO  | Network connection between
> vm://brokergrp2#2 and tcp:///xx.xx.xx.xx:58958@61616 (brokergrp1) has been
> established. | org.apache.activemq.network.DemandForwardingBridgeSupport |
> triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=tcp:///xx.xx.xx.xx:58958@61616, localBroker=
> vm://brokergrp2#2
> 2014-10-16 12:53:12,591 | INFO  | Network connection between
> vm://brokergrp2#0 and tcp:///10.64.253.198:58959@61616 (brokergrp1) has
> been
> established. | org.apache.activemq.network.DemandForwardingBridgeSupport |
> triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=tcp:///xx.xx.xx.xx:58959@61616, localBroker=
> vm://brokergrp2#0
> 2014-10-16 13:00:10,470 | INFO  | Client session timed out, have not heard
> from server in 4071ms for sessionid 0x14918fd40f00007, closing socket
> connection and attempting reconnect | org.apache.zookeeper.ClientCnxn |
> main-SendThread(uromahn-zk1-9208:2181)
> 2014-10-16 13:00:10,575 | INFO  | Demoted to slave |
> org.apache.activemq.leveldb.replicated.MasterElector | ZooKeeper state
> change dispatcher thread
> 2014-10-16 13:00:10,582 | INFO  | Apache ActiveMQ 5.10.0 (brokergrp2,
> ID:uromahn-amq4-9175-46383-1413463934439-0:1) is shutting down |
> org.apache.activemq.broker.BrokerService | ActiveMQ
> BrokerService[brokergrp2] Task-8
> 2014-10-16 13:00:10,594 | WARN  | Transport Connection to:
> tcp://zz.zz.zz.zz:34737 failed: java.io.IOException: Unexpected error
> occured: org.apache.activemq.broker.BrokerStoppedException: Broker
> BrokerService[brokergrp2] is being stopped |
> org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ
> Transport: tcp:///zz.zz.zz.zz:34737@61616
> ....
>
> After that, the log file contains a large number of messages dumped for
> which the broker could not send the acknowledgement back to amq1.
>
> Looking at the log file of the newly promoted master amq5 I see the
> following warning:
> ....
> 2014-10-16 13:00:18,422 | INFO  | Network connection between
> vm://brokergrp2#0 and tcp:///yy.yy.yy.yy:58256@61616 (brokergrp1) has been
> established. | org.apache.activemq.network.DemandForwardingBridgeSupport |
> triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=tcp:///yy.yy.yy.yy:58256@61616, localBroker=
> vm://brokergrp2#0
> 2014-10-16 13:32:20,285 | WARN  | Unexpected session error:
> java.io.IOException: Connection reset by peer |
> org.apache.activemq.leveldb.replicated.MasterLevelDBStore |
> hawtdispatch-DEFAULT-2
> 2014-10-16 13:32:20,286 | INFO  | Slave has disconnected:
> e21da7b3-65a4-4756-a20b-bd77f8eb7607 |
> org.apache.activemq.leveldb.replicated.MasterLevelDBStore |
> hawtdispatch-DEFAULT-2
> ....
>
> After that, there are no messages being transmitted from amq1 to the second
> cluster in the network (neither amq4 nor amq5). This can be seen when
> looking at the corresponding tab of the web console of amq1. Also, when I
> let all broker running, I am getting a message on amq4 that the broker is
> waiting for the replicated store to synchronize one record. (or something
> to
> this extend - unfortunately I deleted the log already. :( ).
>
> So, here is what appears to happen:
> somehow Zookeeper things that amq4 is "dead" and then demotes it to slave
> in
> the middle of the message transfer. This could be because amq4 is too busy
> to respond to heatbeat requests. However, something goes wrong in the
> LevelDB replication and the new master amq5 can't pick up. This seem to
> lead
> to a "dead-lock" situation causing the client to time-out.
> NOTE: all clients are using the failover transport to connect to the
> cluster. The connector between the two cluster is also defined using the
> failover transport.
> Another indication of something "bad" is the fact that the client reports
> an
> exception that the connection to amq4 was reset and it is trying to
> reconnect. However, that connection does not seem to happen within 10
> seconds because my message listener has an internal timeout of 10 seconds.
> I have double and triple-checked the configuration and everything looks ok.
> Also, this doesn't happen in case both producer and consumer connect to the
> same cluster, e.g. amq1 or amq4. So, the individual cluster seem to work.
> The issue only seem to appear when transmitting messages between clusters
> in
> a network of broker.
>
> Any idea if this is a bug or just a SUE (stupid user error). :)
>
> -Uli
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Potential-Bug-in-Master-Slave-with-Replicated-LevelDB-Store-tp4686450.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

uromahn
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

uromahn
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

Tim Bain
masterslave: is an alias for failover: with maxReconnects=0.  (There might
be another URI option included in the alias, I don't remember; I think the
details are in the JIRA where Gary added the failover transport, if you're
curious.)  So there's no need to try using failover explicitly, since the
configuration you're using already used the URI option I was concerned
about.

That's strange that you're seeing different throughput (based on different
numbers of pending messages) based on which direction messages flow between
the clusters.  It might be due to the network issues you referenced; if
not, then hopefully you can figure out which link is the slow one by
finding the last point where messages are piling up.  Is there a
non-trivial amount of latency (more that 15ms, let's say) on any of the
links between brokers or the links between clients and brokers?  I've had
to do quite a bit of tuning to get ActiveMQ to run efficiently across a
high-latency WAN, so if you have a bad network link in your setup you may
need to make some adjustments to improve throughput.

Also, just to confirm: were you comparing pending queue sizes based on
which role (producer-side or consumer-side) the cluster was being used for
your test?  (So comparing amq1-3 in your first setup with amq4-6 in your
second setup and vice versa.)  Make sure your comparisons were apples to
apples between the tests, otherwise the conclusion of lower throughput
might not be valid.

On Fri, Oct 17, 2014 at 4:20 AM, uromahn <[hidden email]> wrote:

> Quick update...
>
> I re-ran my tests as suggested.
>
> First my producer connected to amq4 and the consumer to amq1. That setup
> worked quite well without any error or timeout.
> Then I re-configured it again with messages being sent to amq1 and consumed
> from amq4. To my surprise it worked this time (I re-ran the test three
> times
> yesterday and all three failed the same way!).
> However, I noticed that when transmitting messages from amq1 to amq4, it
> appears that the bridge is slower than the other way around since I saw on
> average 10k pending messages in all queues - the other way around, there
> were on average less than 1,500 msg pending.
>
> On another note: I mentioned that I setup this environment in our private
> cloud infrastructure. This morning I saw a note from our infrastructure
> guys
> that we are having some network issues in that data center hosting this
> environment. It is certainly possible that my issues yesterday may have
> been
> a side-effect of the network problems.
>
> So, I will keep testing but for now, my suspicion is that it is likely to
> be
> a network problem and not an issue within ActiveMQ.
>
> I will follow-up in case the issue shows up again.
>
> Until then, sorry for potentially raising a false alarm.
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Potential-Bug-in-Master-Slave-with-Replicated-LevelDB-Store-tp4686450p4686492.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

uromahn
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

Tim Bain
Take a look at whether the JVM is doing a full garbage collect at the time
when the failover occurs.  Our team has observed clients to failover to an
alternate broker at a time that corresponded to a full GC, and it might be
that the same thing is happening here (but the failover isn't happening
gracefully).  If that's what's going on, you should be able to work around
the problem by tuning your JVM heap and/or your GC strategy, though it
still sounds like there's a bug related to the failover that should be
tracked down and fixed as well.

On Mon, Oct 20, 2014 at 7:36 AM, uromahn <[hidden email]> wrote:

> Ok, looks like the issue is back again.
>
> The network issues have been fixed.
> It is *not* a slow network - pings between VMs are less than 1ms.
>
> I have not investigated the different throughput but wanted to focus on the
> reliability of the replicated message store.
>
> I made some configuration changes to the network connectors: I defined five
> connectors per node (amq1-3).
>
> Here is what I observed:
> * When I launch one producer connecting to amq1 and one consumer connecting
> to amq4 and send 100,000 messages, everything works fine
> * When I launch five producer connecting to amq1 and five consumer
> connecting to amq4 and send 100,000 messages, still fine
> * When I launch 10 producer connecting to amq1 and 10 consumer connecting
> to
> amq4 and send 100,000 messages, I can see the following:
>   1. number of pending messages in the queue on amq1 is slowly but steadily
> increasing, consumer on amq4 is still reading messages
>   2. after about 70,000 to 80,000 messages amq1 suddenly stops working and
> amq2 gets promoted to master. amq4 is still reading messages
>   3. From that time on, the log of amq4 is filling up with the following
> exceptions: 2014-10-20 13:11:43,227 | ERROR | Exception:
> org.apache.activemq.transport.InactivityIOException: Cannot send, channel
> has already failed: null on duplex forward of: ActiveMQTextMessage ...
> <dump
> of message comes here>
>
> Here is an excerpt of the log from amq1 at the time it got "demoted" to
> slave:
> 2014-10-20 12:56:44,007 | INFO  | Slave has now caught up:
> 2607dbe5-e42a-44bf-8f90-6edf8caa8d87 |
> org.apache.activemq.leveldb.replicated.MasterLevelDBStore |
> hawtdispatch-DEFAULT-1
> 2014-10-20 13:11:42,535 | INFO  | Client session timed out, have not heard
> from server in 2763ms for sessionid 0x2492d8210c30003, closing socket
> connection and attempting reconnect | org.apache.zookeeper.ClientCnxn |
> main-SendThread(uromahn-zk2-9775:2181)
> 2014-10-20 13:11:42,639 | INFO  | Demoted to slave |
> org.apache.activemq.leveldb.replicated.MasterElector | ZooKeeper state
> change dispatcher thread
>
> (NOTE: 12:56 was the time the broker cluster was started. Between that time
> and 13:11, I was running the various tests)
>
> After that I can see a ton of exceptions and error messages saying that the
> replicated store has stopped and similar. After some time, it looks the
> broker amq1 has re-stabilized itself and reporting to have been started as
> slave.
>
> I don't know what exactly is going on, but it appears that something is
> wrong with the replicated LevelDB which needs more investigation.
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Potential-Bug-in-Master-Slave-with-Replicated-LevelDB-Store-tp4686450p4686548.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

uromahn
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

uromahn
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

uromahn
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

Tim Bain
G1GC is great for reducing the duration of any single stop-the-world GC
(and hence minimizing latency of any individual operation as well as
avoiding timeouts), but the total time spent performing GCs (and hence the
total amount of time the brokers are paused) is several times that of the
parallel GC algorithm, based on some articles I read a couple weeks back.
So although G1GC should work for a wide range (possibly all) of ActiveMQ
memory usage patterns and may be the right option for you based on how your
broker is used, you may get better overall throughput from sticking with
ParallelGC but adjusting the ratio of YoungGen to OldGen to favor YoungGen
(increasing the odds that a message gets GC'ed before it gets to OldGen)
and the ratio of Eden to Survivor within YoungGen to favor Survivor (to
increase the odds that a message can stick around in YoungGen long enough
to die before it gets promoted to OldGen).  But you have to be confident
that your usage patterns won't allow OldGen to fill during the life of your
broker's uptime (whether that's hours or years), otherwise you'll end up
doing a long full GC and you'd probably have been better off going with
G1GC.

For our broker, we expire undelivered messages quickly (under a minute), so
in theory expanding both YoungGen and Survivor might prevent anything from
getting into OldGen and thus prevent long full GCs.  I'm actually going to
be doing this tuning this week, so I'll report out what I find, though
obviously YMMV since everyone's message usage patterns are different.

On Tue, Oct 21, 2014 at 5:25 AM, uromahn <[hidden email]> wrote:

> Another update:
>
> I ran the broker with the native Java LevelDB and found that I am still
> seeing the Warnings in the log file as reported before.
>
> However, to my surprise the broker seem to perform better and even slightly
> faster! I always thought the native LevelDB should be faster but I guess
> the
> access via JNI may be less optimal than using an embedded Java (or Scala)
> engine.
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Potential-Bug-in-Master-Slave-with-Replicated-LevelDB-Store-tp4686450p4686583.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

Tim Bain
I spent time last week trying to tune the parallel GC to prevent any
objects from reaching OldGen once the broker was up and running in a steady
state, to try to avoid expensive full GCs.  My goal was zero full GCs for a
broker with 3-6 months of uptime, to prevent clients and other brokers from
failing over from one broker to another.

I increased the size of NewGen relative to OldGen, I increased the size of
Survivor relative to Eden, and I tweaked a few other settings, and I was
never able to avoid a slow stream of objects making it into OldGen that
were deemed dead by the time a full GC happened (usually because I
triggered it manually).  I was able to reduce the rate of object promotion
by about half, and full GCs would probably have been less painful when
OldGen is only 5-10% of the total heap, so the changes should have made
full GCs less frequent and less painful, but I wasn't able to eliminate
them entirely.

So I've given up on the parallel GC and I'm now tweaking G1 to make it
behave as we'd like, and so far the results are far more promising than
with the parallel GC.  So I second Ulrich's recommendation to use G1 rather
than parallel GC, even though the overhead of G1 is several times that of
the parallel GC, if you're more interested in avoiding occasional lengthy
pauses due to full GCs than in getting the highest possible throughput from
your broker.

On Tue, Oct 21, 2014 at 10:13 AM, Tim Bain <[hidden email]> wrote:

> G1GC is great for reducing the duration of any single stop-the-world GC
> (and hence minimizing latency of any individual operation as well as
> avoiding timeouts), but the total time spent performing GCs (and hence the
> total amount of time the brokers are paused) is several times that of the
> parallel GC algorithm, based on some articles I read a couple weeks back.
> So although G1GC should work for a wide range (possibly all) of ActiveMQ
> memory usage patterns and may be the right option for you based on how your
> broker is used, you may get better overall throughput from sticking with
> ParallelGC but adjusting the ratio of YoungGen to OldGen to favor YoungGen
> (increasing the odds that a message gets GC'ed before it gets to OldGen)
> and the ratio of Eden to Survivor within YoungGen to favor Survivor (to
> increase the odds that a message can stick around in YoungGen long enough
> to die before it gets promoted to OldGen).  But you have to be confident
> that your usage patterns won't allow OldGen to fill during the life of your
> broker's uptime (whether that's hours or years), otherwise you'll end up
> doing a long full GC and you'd probably have been better off going with
> G1GC.
>
> For our broker, we expire undelivered messages quickly (under a minute),
> so in theory expanding both YoungGen and Survivor might prevent anything
> from getting into OldGen and thus prevent long full GCs.  I'm actually
> going to be doing this tuning this week, so I'll report out what I find,
> though obviously YMMV since everyone's message usage patterns are different.
>
> On Tue, Oct 21, 2014 at 5:25 AM, uromahn <[hidden email]> wrote:
>
>> Another update:
>>
>> I ran the broker with the native Java LevelDB and found that I am still
>> seeing the Warnings in the log file as reported before.
>>
>> However, to my surprise the broker seem to perform better and even
>> slightly
>> faster! I always thought the native LevelDB should be faster but I guess
>> the
>> access via JNI may be less optimal than using an embedded Java (or Scala)
>> engine.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://activemq.2283324.n4.nabble.com/Potential-Bug-in-Master-Slave-with-Replicated-LevelDB-Store-tp4686450p4686583.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>
>