ActiveMQ stability issues in 3 node configuration

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ActiveMQ stability issues in 3 node configuration

p__Z
Hello,

We have a problem with establishing a connection between the network of 3
brokers which work in active-active duplex solution. Whenever those three
nodes are connected, the connection between the brokers drops from one of
the nodes with some weird behaviour that we can see in the ActiveMQ Web
Console and in the ActiveMQ logs.
Every time I refresh the web console the amount of connected brokers
changes.
Sometimes we can see two connected brokers (Assuming that we are on broker
A, I see the connection to broker B and C).
And the other time, there is only one connection (to broker B). Sometimes,
according to the web console, there is zero connections between the
brokers. It can change with every refresh of the web console.

Each broker has a network connector with configuration like:
<amq:networkConnectors>
    <amq:networkConnector name="${amq.connector.name}"
userName="${amq.username}" password="${amq.password}"
         uri="${amq.broker.network.connector.uri}" networkTTL="2"
duplex="true"/>
</amq:networkConnectors>

As you can see, each broker has it's own connector meaning that:
Broker A has amq.broker.network.connector.uri=static:(tcp://brokerB:
61616,tcp://brokerC:61616)
Broker B has amq.broker.network.connector.uri=static:(tcp://brokerA:
61616,tcp://brokerC:61616)
Broker C has amq.broker.network.connector.uri=static:(tcp://brokerA:
61616,tcp://brokerB:61616)

Do you think guys it might be a problem that the duplex=true is set on both
sides of the connector?

KahaDB configuration:
<amq:kahaDB directory="${amq.database.dir}"
        journalMaxFileLength="${amq.journal.max.file.length}"
        checksumJournalFiles="true"
        checkForCorruptJournalFiles="true"
        cleanupInterval="5000"
        checkpointInterval="1000"
        useLock="false">
</amq:kahaDB>

Connector configuration:
<amq:transportConnectors>
    <amq:transportConnector name="Connector" uri="${amq.broker.connector.
uri}"/>
</amq:transportConnectors>

where
amq.broker.connector.uri = tcp://0.0.0.0:61616

At first, I have noticed that we had a bug in the broker configuration. All
brokers had the same name (the name was like "broker") but I have already
changed so each node is suffixed with the node number so the brokers are
named: broker0, broker1, broker2. Is that a correct approach?

Moreover, below you can see some interesting exceptions that we can see in
the logs:

2019-09-13 09:02:45,804 ERROR [ActiveMQ BrokerService[brokerA] Task-3587]
o.a.a.n.DemandForwardingBridgeSupport - Exception:
org.apache.activemq.transport.InactivityIOException: Cannot send, channel
has already failed: tcp://172.18.0.54:35706 on duplex forward of:
ActiveMQTextMessage...

2019-09-13 09:03:16,840 TRACE [ActiveMQ Transport:
tcp:///172.18.0.53:33396@61616]
o.a.a.n.DemandForwardingBridgeSupport - serviceLocalException: disposed
true ex
org.apache.activemq.transport.TransportDisposedIOException: Disposed due to
prior exception
        at org.apache.activemq.transport.ResponseCorrelator.onException(
ResponseCorrelator.java:125)
Caused by: java.io.EOFException: null
        at java.io.DataInputStream.readInt(DataInputStream.java:392)

2019-09-13 09:02:45,783 INFO [ActiveMQ BrokerService[brokerA] Task-3578]
o.a.a.n.DemandForwardingBridgeSupport - Network connection between
vm://brokerA#36640 and tcp:///172.18.0.54:35706@61616 shutdown due to a
local error: {}
java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(
SocketOutputStream.java:115)

Sometimes we can also see correct log messages like:

2019-09-13 09:02:45,146 INFO [triggerStartAsyncNetworkBridgeCreation:
remoteBroker=tcp:///172.18.0.54:35706@61616, localBroker=
vm://brokerA#36640] o.a.a.n.DemandForwardingBridgeSupport - Network
connection b
etween vm://brokerA#36640 and tcp:///172.18.0.54:35706@61616 (brokerC) has
been established.

2019-09-13 09:02:46,711 INFO [triggerStartAsyncNetworkBridgeCreation:
remoteBroker=tcp:///172.18.0.53:33252@61616, localBroker=
vm://brokerA#36644] o.a.a.n.DemandForwardingBridgeSupport - Network
connection b
etween vm://brokerA#36644 and tcp:///172.18.0.53:33252@61616 (brokeB) has
been established.

172.18.0.53 is IP address of Broker B
172.18.0.54 is IP address of Broker C

Also, we have noticed that the amount of messages on DLQ is going down
below zero sometimes. (We don't use any policy to clean up the messages, we
remove them through our consumer which prints the messages to the logs)

What is more, the environment works properly with 2 nodes. Adding the third
one makes it unstable.
We use ActiveMQ 5.15.0 version with Java 8.
My question is: Do you have any tips on how to approach that issue? Maybe
you have encountered a similar problem in the past. I'd be glad for any
tips that you can provide.
Let me know also if you need further details or clarifications.
Reply | Threaded
Open this post in threaded view
|

Re: ActiveMQ stability issues in 3 node configuration

Tim Bain
Piotr,

I've not run across this behavior in my own experience and I can't recall
someone describing it in previous posts on the mailing list. So I'll say
that having two duplex connections between each broker pair *shouldn't*
cause this behavior, though anything is possible. I've used dual duplex
connections in the past when prototyping and everything appeared to work as
expected, but that was with two brokers rather than three (which you also
said worked), so I can't say for sure that that's not the problem. If you
convert all of the inter-broker connections to duplex="false", does the
problem disappear?

The log messages you quoted all seem to indicate that the code in question
is discovering that the connection has been closed and is reacting to that
fact, but none of those lines seem to indicate why the connection is being
closed. Are there no log lines that would indicate why the connections are
being closed? Perhaps something related to the inactivity monitor?

Also, how long do the connections stay up? Are they cycling many times a
second? Once a minute? Once an hour, or a day? Understanding the frequency
of the disconnections and reconnections might help determine what's going
on.

As for the other questions you asked... Using unique broker names isn't
required, but it's a good idea since it can make troubleshooting via logs
or JMX easier. And as for the stats on the number of nodes in the DLQ, I've
not seen anyone mention that before, though if you can isolate any way of
reproducing the problem reliably, please submit a Bug in JIRA.

Tim

On Sun, Sep 15, 2019 at 11:39 PM Piotr Złotowski <[hidden email]>
wrote:

> Hello,
>
> We have a problem with establishing a connection between the network of 3
> brokers which work in active-active duplex solution. Whenever those three
> nodes are connected, the connection between the brokers drops from one of
> the nodes with some weird behaviour that we can see in the ActiveMQ Web
> Console and in the ActiveMQ logs.
> Every time I refresh the web console the amount of connected brokers
> changes.
> Sometimes we can see two connected brokers (Assuming that we are on broker
> A, I see the connection to broker B and C).
> And the other time, there is only one connection (to broker B). Sometimes,
> according to the web console, there is zero connections between the
> brokers. It can change with every refresh of the web console.
>
> Each broker has a network connector with configuration like:
> <amq:networkConnectors>
>     <amq:networkConnector name="${amq.connector.name}"
> userName="${amq.username}" password="${amq.password}"
>          uri="${amq.broker.network.connector.uri}" networkTTL="2"
> duplex="true"/>
> </amq:networkConnectors>
>
> As you can see, each broker has it's own connector meaning that:
> Broker A has amq.broker.network.connector.uri=static:(tcp://brokerB:
> 61616,tcp://brokerC:61616)
> Broker B has amq.broker.network.connector.uri=static:(tcp://brokerA:
> 61616,tcp://brokerC:61616)
> Broker C has amq.broker.network.connector.uri=static:(tcp://brokerA:
> 61616,tcp://brokerB:61616)
>
> Do you think guys it might be a problem that the duplex=true is set on both
> sides of the connector?
>
> KahaDB configuration:
> <amq:kahaDB directory="${amq.database.dir}"
>         journalMaxFileLength="${amq.journal.max.file.length}"
>         checksumJournalFiles="true"
>         checkForCorruptJournalFiles="true"
>         cleanupInterval="5000"
>         checkpointInterval="1000"
>         useLock="false">
> </amq:kahaDB>
>
> Connector configuration:
> <amq:transportConnectors>
>     <amq:transportConnector name="Connector" uri="${amq.broker.connector.
> uri}"/>
> </amq:transportConnectors>
>
> where
> amq.broker.connector.uri = tcp://0.0.0.0:61616
>
> At first, I have noticed that we had a bug in the broker configuration. All
> brokers had the same name (the name was like "broker") but I have already
> changed so each node is suffixed with the node number so the brokers are
> named: broker0, broker1, broker2. Is that a correct approach?
>
> Moreover, below you can see some interesting exceptions that we can see in
> the logs:
>
> 2019-09-13 09:02:45,804 ERROR [ActiveMQ BrokerService[brokerA] Task-3587]
> o.a.a.n.DemandForwardingBridgeSupport - Exception:
> org.apache.activemq.transport.InactivityIOException: Cannot send, channel
> has already failed: tcp://172.18.0.54:35706 on duplex forward of:
> ActiveMQTextMessage...
>
> 2019-09-13 09:03:16,840 TRACE [ActiveMQ Transport:
> tcp:///172.18.0.53:33396@61616]
> o.a.a.n.DemandForwardingBridgeSupport - serviceLocalException: disposed
> true ex
> org.apache.activemq.transport.TransportDisposedIOException: Disposed due to
> prior exception
>         at org.apache.activemq.transport.ResponseCorrelator.onException(
> ResponseCorrelator.java:125)
> Caused by: java.io.EOFException: null
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
> 2019-09-13 09:02:45,783 INFO [ActiveMQ BrokerService[brokerA] Task-3578]
> o.a.a.n.DemandForwardingBridgeSupport - Network connection between
> vm://brokerA#36640 and tcp:///172.18.0.54:35706@61616 shutdown due to a
> local error: {}
> java.net.SocketException: Connection reset
>         at java.net.SocketOutputStream.socketWrite(
> SocketOutputStream.java:115)
>
> Sometimes we can also see correct log messages like:
>
> 2019-09-13 09:02:45,146 INFO [triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=tcp:///172.18.0.54:35706@61616, localBroker=
> vm://brokerA#36640] o.a.a.n.DemandForwardingBridgeSupport - Network
> connection b
> etween vm://brokerA#36640 and tcp:///172.18.0.54:35706@61616 (brokerC) has
> been established.
>
> 2019-09-13 09:02:46,711 INFO [triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=tcp:///172.18.0.53:33252@61616, localBroker=
> vm://brokerA#36644] o.a.a.n.DemandForwardingBridgeSupport - Network
> connection b
> etween vm://brokerA#36644 and tcp:///172.18.0.53:33252@61616 (brokeB) has
> been established.
>
> 172.18.0.53 is IP address of Broker B
> 172.18.0.54 is IP address of Broker C
>
> Also, we have noticed that the amount of messages on DLQ is going down
> below zero sometimes. (We don't use any policy to clean up the messages, we
> remove them through our consumer which prints the messages to the logs)
>
> What is more, the environment works properly with 2 nodes. Adding the third
> one makes it unstable.
> We use ActiveMQ 5.15.0 version with Java 8.
> My question is: Do you have any tips on how to approach that issue? Maybe
> you have encountered a similar problem in the past. I'd be glad for any
> tips that you can provide.
> Let me know also if you need further details or clarifications.
>