Network of brokers: consumers not synchronized

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Network of brokers: consumers not synchronized

jochenw
Hello,

I have two back office brokers connected via a broker network, and multiple moving devices with their own broker, which connect to the backoffice brokers via a network connection in failover mode (secured via ssl - <networkConnector uri="static:(failover:(ssl://${backofficeBroker1}:61617,ssl://${backofficeBroker2}:61617)?maxReconnectAttempts=0&randomize=true)" ...). The moving devices are connected via mobile radio network, and there are frequent disruptions of connectivity.

Rather frequently, I can observe that after reconnection, the broker network connection of the "mobile" brokers to the back office broker looks correct (log file tells that the network connection has been established), but the consumers on queues in the back office are not synchronized to the mobile broker, i.e. messages created there will never be consumed. After a restart of the broker on the moving device, everything is correct.

Has anybody observed such behavior, or - even better - has a solution for this?

My broker versions: back office: 5.13.0 (running standalone), moving device: 5.13.3 (running in Karaf).

By the way: it looks like with an older broker version on the mobile device (5.11.1), but otherwise the same configuration, this problem doesn't occur.

Regards,
Jochen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

jochenw
Hi,

in the meantime, I have digged bit deeper into that. I can see from the logs, that when the mobile radio network connectivity breaks, after 30 seconds both brokers close the connection and the network broker bridge is stopped. Then the network connector tries to reconnect. Most of the time, after the mobile radio network is back again, the "mobile broker" reconnects successfully to the backoffice broker. Both broker logs show an entry, and in the Web console of the backoffice broker, I can see the ssl connection.

In some cases, it happens that in the mobile broker log, a "Successfully connected to ssl://..." appears. To this broker thinks that it has a successful connection. However, no entry appears in the backoffice broker log, and the connection can also not be seen in the Web console of this broker. Producers/consumers are also not synchronized. Unfortunately, it looks like the mobile broker never detects that the broker network is not working and doesn't try a reconnection.

I'm pretty sure that I have not observed this while still using 5.11.1 for the onboard broker. With 5.13.3 and 5.14.1, the problem appears. Don't know about other versions.

Best Regards,
Jochen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

Tim Bain
In reply to this post by jochenw
Also, can you please give the couple of lines that precede the one you
quoted in the bad case, to help us see how the broker got to that point?

Tim

On Nov 14, 2016 8:29 AM, Tim Bain <[hidden email]> wrote:

Just to be clear: you get "Successfully
connected to ssl://..." each time the connection fails, and never when it
succeeds.  Right?

Can you please provide the filename and line number from that log line?

Tim

On Nov 14, 2016 5:33 AM, "jochenw" <[hidden email]> wrote:

Hi,

in the meantime, I have digged bit deeper into that. I can see from the
logs, that when the mobile radio network connectivity breaks, after 30
seconds both brokers close the connection and the network broker bridge is
stopped. Then the network connector tries to reconnect. Most of the time,
after the mobile radio network is back again, the "mobile broker" reconnects
successfully to the backoffice broker. Both broker logs show an entry, and
in the Web console of the backoffice broker, I can see the ssl connection.

In some cases, it happens that in the mobile broker log, a "Successfully
connected to ssl://..." appears. To this broker thinks that it has a
successful connection. However, no entry appears in the backoffice broker
log, and the connection can also not be seen in the Web console of this
broker. Producers/consumers are also not synchronized. Unfortunately, it
looks like the mobile broker never detects that the broker network is not
working and doesn't try a reconnection.

I'm pretty sure that I have not observed this while still using 5.11.1 for
the onboard broker. With 5.13.3 and 5.14.1, the problem appears. Don't know
about other versions.

Best Regards,
Jochen




--
View this message in context: http://activemq.2283324.n4.
nabble.com/Network-of-brokers-consumers-not-synchronized-
tp4718852p4719193.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

Tim Bain
In reply to this post by jochenw
Just to be clear: you get "Successfully
connected to ssl://..." each time the connection fails, and never when it
succeeds.  Right?

Can you please provide the filename and line number from that log line?

Tim

On Nov 14, 2016 5:33 AM, "jochenw" <[hidden email]> wrote:

> Hi,
>
> in the meantime, I have digged bit deeper into that. I can see from the
> logs, that when the mobile radio network connectivity breaks, after 30
> seconds both brokers close the connection and the network broker bridge is
> stopped. Then the network connector tries to reconnect. Most of the time,
> after the mobile radio network is back again, the "mobile broker"
> reconnects
> successfully to the backoffice broker. Both broker logs show an entry, and
> in the Web console of the backoffice broker, I can see the ssl connection.
>
> In some cases, it happens that in the mobile broker log, a "Successfully
> connected to ssl://..." appears. To this broker thinks that it has a
> successful connection. However, no entry appears in the backoffice broker
> log, and the connection can also not be seen in the Web console of this
> broker. Producers/consumers are also not synchronized. Unfortunately, it
> looks like the mobile broker never detects that the broker network is not
> working and doesn't try a reconnection.
>
> I'm pretty sure that I have not observed this while still using 5.11.1 for
> the onboard broker. With 5.13.3 and 5.14.1, the problem appears. Don't know
> about other versions.
>
> Best Regards,
> Jochen
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Network-of-brokers-consumers-not-synchronized-
> tp4718852p4719193.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

jochenw
Hi Tim,

first the respectve lines from karaf.log (the mobile broker is running in Karaf). You can see the successful connection at 04:46:01,955. Then, bit before 05:37:11, the mobile radio connection obviously was lost (I can also see this from a process which monitors the ppp connection - reported a loss of connectivity at ~ 05:36:40). Then there are several lines (some skipped here since they always look the same) where re-connection of the broker failed (INetAddress lookup failure, since the DNS was not reachable. Last time at 05:37:27. Then, at 05:37:51,444, the mobile broker logs successful connection.

2016-11-14 04:46:00,189 | INFO  | ActiveMQ Task-16 | DiscoveryNetworkConnector        | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Establishing network connection from vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9?async=false&create=false to failover:(ssl://backoffice1.abc.com:61617,ssl://backoffice2.abc.com:61617)?maxReconnectAttempts=0&randomize=true
2016-11-14 04:46:01,955 | INFO  | ActiveMQ Task-1  | FailoverTransport                | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Successfully connected to ssl://backoffice1.abc.com:61617
2016-11-14 04:46:02,072 | INFO  | 9f6ce7f804a9#384 | DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Network connection between vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9#384 and ssl://backoffice1.abc.com:61617 (Backoffice_Broker_1) has been established.
2016-11-14 05:37:11,992 | WARN  | tyMonitor Worker | FailoverTransport                | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Transport (ssl://backoffice1.abc.com:61617) failed , not attempting to automatically reconnect: org.apache.activemq.transport.InactivityIOException: Channel was inactive for too (>30000) long: tcp://aaa.bbb.ccc.ddd:61617
2016-11-14 05:37:11,995 | INFO  | tyMonitor Worker | DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Network connection between vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9#384 and unconnected shutdown due to a local error: org.apache.activemq.transport.TransportDisposedIOException: Disposed due to prior exception
2016-11-14 05:37:12,073 | INFO  | 7f804a9] Task-69 | DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 bridge to Backoffice_Broker_1 stopped
2016-11-14 05:37:13,006 | INFO  | ActiveMQ Task-17 | DiscoveryNetworkConnector        | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Establishing network connection from vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9?async=false&create=false to failover:(ssl://backoffice1.abc.com:61617,ssl://backoffice2.abc.com:61617)?maxReconnectAttempts=0&randomize=true
2016-11-14 05:37:13,044 | ERROR | ActiveMQ Task-17 | FailoverTransport                | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Failed to Lookup INetAddress for URI[ssl://backoffice2.abc.com:61617] : java.net.UnknownHostException: backoffice2.abc.com: unknown error
2016-11-14 05:37:13,278 | ERROR | ActiveMQ Task-1  | FailoverTransport                | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Failed to connect to [ssl://backoffice1.abc.com:61617, ssl://backoffice2.abc.com:61617] after: 1 attempt(s)
2016-11-14 05:37:13,280 | WARN  | 9f6ce7f804a9#388 | DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Network connection between vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9#388 and unconnected shutdown due to a remote error: java.util.concurrent.TimeoutException
2016-11-14 05:37:13,310 | INFO  | 7f804a9] Task-69 | DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 bridge to Unknown stopped
....
2016-11-14 05:37:27,525 | INFO  | ActiveMQ Task-17 | DiscoveryNetworkConnector        | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Establishing network connection from vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9?async=false&create=false to failover:(ssl://backoffice1.abc.com:61617,ssl://backoffice2.abc.com:61617)?maxReconnectAttempts=0&randomize=true
2016-11-14 05:37:27,534 | ERROR | ActiveMQ Task-17 | FailoverTransport                | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Failed to Lookup INetAddress for URI[ssl://backoffice2.abc.com:61617] : java.net.UnknownHostException: backoffice2.abc.com: unknown error
2016-11-14 05:37:27,674 | ERROR | ActiveMQ Task-1  | FailoverTransport                | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Failed to connect to [ssl://backoffice1.abc.com:61617, ssl://backoffice2.abc.com:61617] after: 1 attempt(s)
2016-11-14 05:37:27,675 | WARN  | 9f6ce7f804a9#400 | DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Network connection between vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9#400 and unconnected shutdown due to a remote error: java.util.concurrent.TimeoutException
2016-11-14 05:37:27,679 | INFO  | 7f804a9] Task-72 | DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 bridge to Unknown stopped
2016-11-14 05:37:43,678 | INFO  | ActiveMQ Task-17 | DiscoveryNetworkConnector        | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Establishing network connection from vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9?async=false&create=false to failover:(ssl://backoffice1.abc.com:61617,ssl://backoffice2.abc.com:61617)?maxReconnectAttempts=0&randomize=true
2016-11-14 05:37:51,444 | INFO  | ActiveMQ Task-1  | FailoverTransport                | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Successfully connected to ssl://backoffice1.abc.com:61617


Next activemq.log from the backoffice broker backoffice1 (there is also a backoffice2, both are connected via a broker network in the backoffice - in this example the successful connections were made between the mobile broker and backoffice1). You again can see the successful connection at 04:46:01,397, and the loss of connection at 05:37:11,154. In between, there are entries related to other mobile brokers (omitted here). After 05:37:11,154, there was no more entry in the log for the MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 in the backoffice activemq.log, although the log of the mobile broker (karaf.log, see above) shows a successful reconnection.

2016-11-14 04:46:01,390 | INFO  | Started responder end of duplex bridge 94170c6d-3af8-4d7f-af53-9f6ce7f804a9_backoffice@ID:GERBVGTRAM1527-57289-1479027266276-0:1 | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: ssl:///aaa.bbb.ccc.ddd:46403
2016-11-14 04:46:01,397 | INFO  | Network connection between vm://Backoffice_Broker_1#3036 and ssl:///aaa.bbb.ccc.ddd:46403 (MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9) has been established. | org.apache.activemq.network.DemandForwardingBridgeSupport | triggerStartAsyncNetworkBridgeCreation: remoteBroker=ssl:///aaa.bbb.ccc.ddd:46403, localBroker= vm://Backoffice_Broker_1#3036
...
2016-11-14 05:37:11,146 | WARN  | Network connection between vm://Backoffice_Broker_1#3036 and ssl:///aaa.bbb.ccc.ddd:46403 shutdown due to a remote error: org.apache.activemq.transport.InactivityIOException: Channel was inactive for too (>30000) long: tcp://aaa.bbb.ccc.ddd:46403 | org.apache.activemq.network.DemandForwardingBridgeSupport | ActiveMQ InactivityMonitor Worker
2016-11-14 05:37:11,154 | INFO  | Backoffice_Broker_1 bridge to MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 stopped | org.apache.activemq.network.DemandForwardingBridgeSupport | ActiveMQ BrokerService[Backoffice_Broker_1] Task-6927

In the log of the second backoffice broker, there was no entry for this mobile broker in activemq.log in that time frame. For completeness, here the network connector definition on the mobile broker:

<networkConnectors>
        <networkConnector uri="static:(failover:(ssl://${backofficeBroker1}:61617,ssl://${backofficeBroker2}:61617)?maxReconnectAttempts=0&amp;randomize=true)"
                name="${deviceId}_backoffice"
                dynamicOnly="true"
                networkTTL="3"
                duplex="true"
                conduitSubscriptions="false"
                decreaseNetworkConsumerPriority="false">
                <dynamicallyIncludedDestinations>
                        <queue physicalName="queue.platform.mobileToBackoffice.${queuePostfix}"/>
                        <queue physicalName="*.*.*.${deviceId}"/>
                        <topic physicalName="topic.platform.backofficeToMobile.all"/>
                </dynamicallyIncludedDestinations>
        </networkConnector>
</networkConnectors>

On the backoffice side, there is a simple ssl connector. As already mentioned, most of the times it works. Only sometimes, this behavior appears. The bad thing is that the mobile broker looks like thinking that everything is ok, and after this, it seems like it also doesn't get a signal any longer that connectivity is lost, so it stays in this status. In the Karaf console, activemq:dstat shows that no backoffice consumers are on the queues.

Best Regards,
Jochen



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

Tim Bain
When the problem occurs, do you see an outbound network connection from the
mobile broker to the backoffice broker?  Does it go to the right host (i.e.
backoffice1) at the right IP address?  And do you see a corresponding
inbound connection on the backoffice1 broker?

Also, is it expected that the mobile broker would be unable to resolve the
hostname of the backoffice broker during a period of disconnection?  That
might be expected or unexpected depending on your network configuration, so
it might be a red herring, but I want to make sure we don't overlook it if
it's actually a problem.

Tim

On Nov 15, 2016 7:01 AM, "jochenw" <[hidden email]> wrote:

> Hi Tim,
>
> first the respectve lines from karaf.log (the mobile broker is running in
> Karaf). You can see the successful connection at 04:46:01,955. Then, bit
> before 05:37:11, the mobile radio connection obviously was lost (I can also
> see this from a process which monitors the ppp connection - reported a loss
> of connectivity at ~ 05:36:40). Then there are several lines (some skipped
> here since they always look the same) where re-connection of the broker
> failed (INetAddress lookup failure, since the DNS was not reachable. Last
> time at 05:37:27. Then, at 05:37:51,444, the mobile broker logs successful
> connection.
>
> 2016-11-14 04:46:00,189 | INFO  | ActiveMQ Task-16 |
> DiscoveryNetworkConnector        | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | Establishing network connection from
> vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-
> 9f6ce7f804a9?async=false&create=false
> to
> failover:(ssl://backoffice1.abc.com:61617,ssl://backoffice2.abc.com:61617
> )?maxReconnectAttempts=0&randomize=true
> 2016-11-14 04:46:01,955 | INFO  | ActiveMQ Task-1  | FailoverTransport
> | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Successfully connected
> to ssl://backoffice1.abc.com:61617
> 2016-11-14 04:46:02,072 | INFO  | 9f6ce7f804a9#384 |
> DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | Network connection between
> vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9#384 and
> ssl://backoffice1.abc.com:61617 (Backoffice_Broker_1) has been
> established.
> 2016-11-14 05:37:11,992 | WARN  | tyMonitor Worker | FailoverTransport
> | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Transport
> (ssl://backoffice1.abc.com:61617) failed , not attempting to automatically
> reconnect: org.apache.activemq.transport.InactivityIOException: Channel
> was
> inactive for too (>30000) long: tcp://aaa.bbb.ccc.ddd:61617
> 2016-11-14 05:37:11,995 | INFO  | tyMonitor Worker |
> DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | Network connection between
> vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9#384 and
> unconnected shutdown due to a local error:
> org.apache.activemq.transport.TransportDisposedIOException: Disposed due
> to
> prior exception
> 2016-11-14 05:37:12,073 | INFO  | 7f804a9] Task-69 |
> DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 bridge
> to
> Backoffice_Broker_1 stopped
> 2016-11-14 05:37:13,006 | INFO  | ActiveMQ Task-17 |
> DiscoveryNetworkConnector        | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | Establishing network connection from
> vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-
> 9f6ce7f804a9?async=false&create=false
> to
> failover:(ssl://backoffice1.abc.com:61617,ssl://backoffice2.abc.com:61617
> )?maxReconnectAttempts=0&randomize=true
> 2016-11-14 05:37:13,044 | ERROR | ActiveMQ Task-17 | FailoverTransport
> | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Failed to Lookup
> INetAddress for URI[ssl://backoffice2.abc.com:61617] :
> java.net.UnknownHostException: backoffice2.abc.com: unknown error
> 2016-11-14 05:37:13,278 | ERROR | ActiveMQ Task-1  | FailoverTransport
> | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Failed to connect to
> [ssl://backoffice1.abc.com:61617, ssl://backoffice2.abc.com:61617] after:
> 1
> attempt(s)
> 2016-11-14 05:37:13,280 | WARN  | 9f6ce7f804a9#388 |
> DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | Network connection between
> vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9#388 and
> unconnected shutdown due to a remote error:
> java.util.concurrent.TimeoutException
> 2016-11-14 05:37:13,310 | INFO  | 7f804a9] Task-69 |
> DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 bridge
> to
> Unknown stopped
> ....
> 2016-11-14 05:37:27,525 | INFO  | ActiveMQ Task-17 |
> DiscoveryNetworkConnector        | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | Establishing network connection from
> vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-
> 9f6ce7f804a9?async=false&create=false
> to
> failover:(ssl://backoffice1.abc.com:61617,ssl://backoffice2.abc.com:61617
> )?maxReconnectAttempts=0&randomize=true
> 2016-11-14 05:37:27,534 | ERROR | ActiveMQ Task-17 | FailoverTransport
> | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Failed to Lookup
> INetAddress for URI[ssl://backoffice2.abc.com:61617] :
> java.net.UnknownHostException: backoffice2.abc.com: unknown error
> 2016-11-14 05:37:27,674 | ERROR | ActiveMQ Task-1  | FailoverTransport
> | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Failed to connect to
> [ssl://backoffice1.abc.com:61617, ssl://backoffice2.abc.com:61617] after:
> 1
> attempt(s)
> 2016-11-14 05:37:27,675 | WARN  | 9f6ce7f804a9#400 |
> DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | Network connection between
> vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9#400 and
> unconnected shutdown due to a remote error:
> java.util.concurrent.TimeoutException
> 2016-11-14 05:37:27,679 | INFO  | 7f804a9] Task-72 |
> DemandForwardingBridgeSupport    | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 bridge
> to
> Unknown stopped
> 2016-11-14 05:37:43,678 | INFO  | ActiveMQ Task-17 |
> DiscoveryNetworkConnector        | 62 - org.apache.activemq.activemq-osgi
> -
> 5.14.1 | Establishing network connection from
> vm://MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-
> 9f6ce7f804a9?async=false&create=false
> to
> failover:(ssl://backoffice1.abc.com:61617,ssl://backoffice2.abc.com:61617
> )?maxReconnectAttempts=0&randomize=true
> 2016-11-14 05:37:51,444 | INFO  | ActiveMQ Task-1  | FailoverTransport
> | 62 - org.apache.activemq.activemq-osgi - 5.14.1 | Successfully connected
> to ssl://backoffice1.abc.com:61617
>
>
> Next activemq.log from the backoffice broker backoffice1 (there is also a
> backoffice2, both are connected via a broker network in the backoffice - in
> this example the successful connections were made between the mobile broker
> and backoffice1). You again can see the successful connection at
> 04:46:01,397, and the loss of connection at 05:37:11,154. In between, there
> are entries related to other mobile brokers (omitted here). After
> 05:37:11,154, there was no more entry in the log for the
> MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 in the
> backoffice
> activemq.log, although the log of the mobile broker (karaf.log, see above)
> shows a successful reconnection.
>
> 2016-11-14 04:46:01,390 | INFO  | Started responder end of duplex bridge
> 94170c6d-3af8-4d7f-af53-9f6ce7f804a9_backoffice@ID:GERBVGTRAM1527-57289-
> 1479027266276-0:1
> | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport:
> ssl:///aaa.bbb.ccc.ddd:46403
> 2016-11-14 04:46:01,397 | INFO  | Network connection between
> vm://Backoffice_Broker_1#3036 and ssl:///aaa.bbb.ccc.ddd:46403
> (MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9) has been
> established. | org.apache.activemq.network.DemandForwardingBridgeSupport |
> triggerStartAsyncNetworkBridgeCreation:
> remoteBroker=ssl:///aaa.bbb.ccc.ddd:46403, localBroker=
> vm://Backoffice_Broker_1#3036
> ...
> 2016-11-14 05:37:11,146 | WARN  | Network connection between
> vm://Backoffice_Broker_1#3036 and ssl:///aaa.bbb.ccc.ddd:46403 shutdown due
> to a remote error: org.apache.activemq.transport.InactivityIOException:
> Channel was inactive for too (>30000) long: tcp://aaa.bbb.ccc.ddd:46403 |
> org.apache.activemq.network.DemandForwardingBridgeSupport | ActiveMQ
> InactivityMonitor Worker
> 2016-11-14 05:37:11,154 | INFO  | Backoffice_Broker_1 bridge to
> MOBILE_MessageBroker_94170c6d-3af8-4d7f-af53-9f6ce7f804a9 stopped |
> org.apache.activemq.network.DemandForwardingBridgeSupport | ActiveMQ
> BrokerService[Backoffice_Broker_1] Task-6927
>
> In the log of the second backoffice broker, there was no entry for this
> mobile broker in activemq.log in that time frame. For completeness, here
> the
> network connector definition on the mobile broker:
>
> <networkConnectors>
>         <networkConnector
> uri="static:(failover:(ssl://${backofficeBroker1}:61617,ssl:
> //${backofficeBroker2}:61617)?maxReconnectAttempts=0&amp;randomize=true)"
>                 name="${deviceId}_backoffice"
>                 dynamicOnly="true"
>                 networkTTL="3"
>                 duplex="true"
>                 conduitSubscriptions="false"
>                 decreaseNetworkConsumerPriority="false">
>                 <dynamicallyIncludedDestinations>
>                         <queue
> physicalName="queue.platform.mobileToBackoffice.${queuePostfix}"/>
>                         <queue physicalName="*.*.*.${deviceId}"/>
>                         <topic
> physicalName="topic.platform.backofficeToMobile.all"/>
>                 </dynamicallyIncludedDestinations>
>         </networkConnector>
> </networkConnectors>
>
> On the backoffice side, there is a simple ssl connector. As already
> mentioned, most of the times it works. Only sometimes, this behavior
> appears. The bad thing is that the mobile broker looks like thinking that
> everything is ok, and after this, it seems like it also doesn't get a
> signal
> any longer that connectivity is lost, so it stays in this status. In the
> Karaf console, activemq:dstat shows that no backoffice consumers are on the
> queues.
>
> Best Regards,
> Jochen
>
>
>
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Network-of-brokers-consumers-not-synchronized-
> tp4718852p4719226.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

jochenw
Hi Tim,

how can I see the outbound connection on the mobile broker? I only have the activemq commands in the karaf shell, and neither activemq:bstat nore activemq:dstat show the connections. On the backoffice broker, I'm not sure, but I think that no connection was shown (I will have to wait until the problem occurs - currently not reproducible on a test setup by simply bringing up and down the network IF repeatedly). I also assume that it is the right IP address, since when restarting Karaf (and with it the mobile broker), connection can be made successfully.

It is expected that without network connectivity, the hostname of the backoffice broker cannot be resolved, since no DNS is then available to the mobile system. Most times, after the network connection is back, reconnection of the brokers works correctly.

By the way: sometimes I also get an InstanceAlreadyExistsException on the mobile broker for one of the queues. But this looks like another error - in this case the consumers are there, so the basic network connection exists, but there is some problem with synchronization of the queues.

Regards,
Jochen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

jochenw
Hello,

found the right command: activemq:query | grep -A 10 -B 10 networkConnectors.

And the problem occurred again, so I could check this. The network connector is present on the mobile broker, but the web console of the backoffice broker doesn't show a connection.

Regards,
Jochen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

Tim Bain
So I was thinking about something more like netstat, to answer the question
at the TCP layer. I've never used Karaf, but the documentation doesn't seem
to provide a way to do that from within the container, but at a minimum you
can run netstat on the host on which Karaf is running.

I'm wondering whether your mobile broker is connecting to a different
broker (not your backoffice one), which could explain why the mobile broker
says it's connected but the backoffice one says it's not.  Knowing where
the TCP connection is going would help with that.

Tim

On Nov 25, 2016 6:09 AM, "jochenw" <[hidden email]> wrote:

> Hello,
>
> found the right command: activemq:query | grep -A 10 -B 10
> networkConnectors.
>
> And the problem occurred again, so I could check this. The network
> connector
> is present on the mobile broker, but the web console of the backoffice
> broker doesn't show a connection.
>
> Regards,
> Jochen
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Network-of-brokers-consumers-not-synchronized-
> tp4718852p4719488.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

jochenw
I'll check this next time the issue shows up. But I'm pretty sure that if the connection is there on TCP level, it can be to nothing else than one of the two backoffice brokers, since only these are defined in the connection URI for the network connection. But anyways a good idea to check whether the TCP connection is there at all.

Regards,
Jochen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

jochenw
Saw it again.

- The backoffice broker had shut down the connection (Channel was inactive for too (>30000) long).
- The mobile broker logged "Transport failed, not attempting to automatically reconnect; java.io.EOFException" and "bridge to .... stopped"
- One second later, the mobile broker tries reconnection, "Establishing network connection from ... to ..."
- 16 seconds after this, the mobile broker logs "Successfully connected to ..."
- The backoffice broker has no log entry, and also doesn't show the connection in the Web console
- On tcp level, the connection is there (netstat -an | grep 616):
     on mobile side: tcp <ip mobile>:45997 <ip backoffice>:61617 ESTABLISHED
     on backoffice side: tcp6 <ip backoffice>:61617 <ip mobile>:45997 ESTABLISHED

Regards,
Jochen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Network of brokers: consumers not synchronized

jochenw
Some news on that: in the meantime, we have switched to only using static bridges in our project for connection of the mobile brokers to the back office. Is much more stable now.

However, there is still some problem which shows up now and then: sometimes the back office broker "forgets" to start the responder end of the duplex="true" bridge. In this case there is no consumer on the respective queues in the backoffice broker, an the network connector on the mobile broker doesn't get an active bridge, although the connection is there (and the duplex="false" bridge for the direction mobile->back office works absolutely fine). Happens in one out of ~ 500 reconnections. I have tested this by turning up and down the network interface in an endless loop on one of the mobile target systems.

So there still seems to be some instability when building the duplex bridge, even for static bridges. I have a "workaround" solution by monitoring the mobile broker via JMX beans, checking whether the duplex network connector has an active bridge when the connection is there. If not, the activemq-osgi bundle is restarted, which heals the problem.

I'm not sure whether is makes sense to create a Jira issue for that.

Regards,
Jochen
Loading...