MultiCast Discovery and refusal of connection

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

MultiCast Discovery and refusal of connection

Eric-AWL
Hi everybody

Is there a reason that can explain that a Multicast Discovery network connector refuses to connect to a distant broker (even after the reconnect delay) ?

I verified with sniffer that (in principle) correct multicast frame was present on the subnetwork.

Thank you in advance for your answer
Eric-AWL
Reply | Threaded
Open this post in threaded view
|

Re: MultiCast Discovery and refusal of connection

joe_fernandez
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: MultiCast Discovery and refusal of connection

Eric-AWL
Hi Joe

My distant transport connector looks like

<transportConnector name="NOCP5-DEFAULT-IN"
                          uri="tcp://tpnocp06v-bus:13080?useLocalHost=false"
                          discoveryUri="multicast://default?group=NOCP5-DEFAULT"/>

And my corresponding network connector looks like

<networkConnector name="NOCP5-DEFAULT-OUT"
                        uri="multicast://default?group=NOCP5-DEFAULT"
                        networkTTL="1"
                        conduitSubscriptions="false"
                        dynamicOnly="true"
                        duplex="false"/>

Some of my brokers have a "duplex" connection without the transport connector. Some others have a "not duplex" connection with the corresponding network connector in which case, they are TCP connected each others.

It is a complex network of brokers architecture.

The problem is that, sometimes, when active network connections are normally broken (network fault for example), the broker doesn't succeed in re-establishing them when the error is corrected.
If I start again the faulty broker, some faulty connections are up again, and some others (that were previously correctly up) can't be established .... very strange.
The number of active connections is not always the same.

I don't understand. We tried with the ActiveMQ 5.2 version and the fuse-5.3-0.6 version.

I just wonder if there can be a "normal" reason which can explain that ActiveMQ refuses to connect to a distant discovered broker.


We have an own external mechanism which transform multicast frames to TCP frames, and TCP frames to multicast frames back. (A gateway between different sub-networks). Perhaps the problem is here ....

Thank you for your answer.
Eric-AWL



Joe Fernandez wrote
What does the transport connector for your 'distant' broker look like? Reason I ask is that the wildcard (0.0.0.0) vs localhost IP address issue has been biting lots of folks.

https://issues.apache.org/activemq/browse/AMQ-2094

Can you telnet to your 'distant' broker?

Joe
http://www.ttmsolutions.com
Reply | Threaded
Open this post in threaded view
|

Re: MultiCast Discovery and refusal of connection

Eric-AWL
Hi


With 5.2.0, I have some explicit log traces which can help to find the origin of the problem. I just opened the AMQ-2774 case.

The problem appears when
- the multicast discovery process discovered a distant broker.
   a line "Establishing ..."  appears in the log file
- But the connection is not established (for any reason)
   no line "has been established" appears in the log file
- the connection is broken
   a line "bridge to Unknown stopped" appears in the log file

Then, even if the distant broker is restarted, the current broker doesn't try to establish a connection with it (no new "Establishing ..." line appears)

It seems to be the same in fuse 5.3.0.6 version.

Eric-AWL
Reply | Threaded
Open this post in threaded view
|

CountDownLatch problem . MultiCast Discovery and refusal of connection on DUPLEX connection (Fuse 5.3.0.5) .

Eric-AWL
Hi

I'm sure that I identified a Latch problem in Multicast Network Discovery mechanism on Duplex connection

The multicast notifier thread is blocked. here the trace

"Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]" daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition [0x0000000044c26000..0x0000000044c26b90]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aaab3dd66f0> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
        at org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
        at org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
        at org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

The problem appears when the network is quickly and alternatively on/off between the two components.
The bridge is created in one direction, but the answer can not be received.

The thread is blocked on the CountDownLatch. Even if multicast frames are received, the component can not establish a new network connection.

Here are an corresponding activemq trace

When it is OK :
2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      - Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge         - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160 and tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus) has been established.

2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge         - SIBBusModule-NOCP5-tpnocp08s-bus bridge to SIBBusSupervisor-tpnocp11v-bus stopped

2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      - Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge         - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162 and tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus) has been established.

2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge         - SIBBusModule-NOCP5-tpnocp08s-bus bridge to SIBBusSupervisor-tpnocp11v-bus stopped

When it is KO : "Unknown"

2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      - Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge         - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164 and tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due to a remote error: java.net.SocketException: Connection reset
2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge         - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped


Here is the other side corresponding activemq trace

activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO  TransportConnection            - Created Duplex Bridge back to SIBBusModule-NOCP5-tpnocp08s-bus

activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO  DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to SIBBusModule-NOCP5-tpnocp08s-bus stopped

activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO  TransportConnection            - Created Duplex Bridge back to SIBBusModule-NOCP5-tpnocp08s-bus
activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO  DemandForwardingBridge         - Network connection between vm://SIBBusSupervisor-tpnocp11v-bus#1770 and tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been established.

activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO  DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to SIBBusModule-NOCP5-tpnocp08s-bus stopped

activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO  TransportConnection            - Created Duplex Bridge back to SIBBusModule-NOCP5-tpnocp08s-bus
activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO  DemandForwardingBridge         - Network connection between vm://SIBBusSupervisor-tpnocp11v-bus#1806 and tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been established.

activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO  DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to SIBBusModule-NOCP5-tpnocp08s-bus stopped


Eric-AWL
Reply | Threaded
Open this post in threaded view
|

CountDownLatch problem . MultiCast Discovery and refusal of connection on DUPLEX connection (Fuse 5.3.0.5) .

Eric-AWL
What could happen if we add

         if (configuration.isDuplex() && duplexInitiatingConnection == null) {
                // initiator side of duplex network
                remoteBrokerNameKnownLatch.countDown();
            }

into the stop() method of DemandForwardingBridgeSupport class ?

Eric-AWL

Eric-AWL wrote
Hi

I'm sure that I identified a Latch problem in Multicast Network Discovery mechanism on Duplex connection

The multicast notifier thread is blocked. here the trace

"Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]" daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition [0x0000000044c26000..0x0000000044c26b90]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aaab3dd66f0> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
        at org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
        at org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
        at org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

The problem appears when the network is quickly and alternatively on/off between the two components.
The bridge is created in one direction, but the answer can not be received.

The thread is blocked on the CountDownLatch. Even if multicast frames are received, the component can not establish a new network connection.

Here are an corresponding activemq trace

When it is OK :
2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      - Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge         - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160 and tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus) has been established.

2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge         - SIBBusModule-NOCP5-tpnocp08s-bus bridge to SIBBusSupervisor-tpnocp11v-bus stopped

2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      - Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge         - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162 and tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus) has been established.

2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge         - SIBBusModule-NOCP5-tpnocp08s-bus bridge to SIBBusSupervisor-tpnocp11v-bus stopped

When it is KO : "Unknown"

2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      - Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge         - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164 and tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due to a remote error: java.net.SocketException: Connection reset
2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge         - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped


Here is the other side corresponding activemq trace

activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO  TransportConnection            - Created Duplex Bridge back to SIBBusModule-NOCP5-tpnocp08s-bus

activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO  DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to SIBBusModule-NOCP5-tpnocp08s-bus stopped

activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO  TransportConnection            - Created Duplex Bridge back to SIBBusModule-NOCP5-tpnocp08s-bus
activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO  DemandForwardingBridge         - Network connection between vm://SIBBusSupervisor-tpnocp11v-bus#1770 and tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been established.

activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO  DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to SIBBusModule-NOCP5-tpnocp08s-bus stopped

activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO  TransportConnection            - Created Duplex Bridge back to SIBBusModule-NOCP5-tpnocp08s-bus
activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO  DemandForwardingBridge         - Network connection between vm://SIBBusSupervisor-tpnocp11v-bus#1806 and tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been established.

activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO  DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to SIBBusModule-NOCP5-tpnocp08s-bus stopped


Eric-AWL
Reply | Threaded
Open this post in threaded view
|

Re: CountDownLatch problem . MultiCast Discovery and refusal of connection on DUPLEX connection (Fuse 5.3.0.5) .

gtully
that seems reasonable. want to submit a patch against trunk?

On 6 July 2010 12:10, Eric-AWL <[hidden email]> wrote:

>
> What could happen if we add
>
>         if (configuration.isDuplex() && duplexInitiatingConnection == null)
> {
>                // initiator side of duplex network
>                remoteBrokerNameKnownLatch.countDown();
>            }
>
> into the stop() method of DemandForwardingBridgeSupport class ?
>
> Eric-AWL
>
>
> Eric-AWL wrote:
>>
>> Hi
>>
>> I'm sure that I identified a Latch problem in Multicast Network Discovery
>> mechanism on Duplex connection
>>
>> The multicast notifier thread is blocked. here the trace
>>
>> "Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]"
>> daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition
>> [0x0000000044c26000..0x0000000044c26b90]
>>    java.lang.Thread.State: WAITING (parking)
>>       at sun.misc.Unsafe.park(Native Method)
>>       - parking to wait for  <0x00002aaab3dd66f0> (a
>> java.util.concurrent.CountDownLatch$Sync)
>>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>>       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
>>       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
>>       at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>>       at
>> org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
>>       at
>> org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
>>       at
>> org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
>>       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>> The problem appears when the network is quickly and alternatively on/off
>> between the two components.
>> The bridge is created in one direction, but the answer can not be
>> received.
>>
>> The thread is blocked on the CountDownLatch. Even if multicast frames are
>> received, the component can not establish a new network connection.
>>
>> Here are an corresponding activemq trace
>>
>> When it is OK :
>> 2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector
>> - Establishing network connection from
>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>> 2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge
>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160 and
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>> has been established.
>>
>> 2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>> SIBBusSupervisor-tpnocp11v-bus stopped
>>
>> 2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector
>> - Establishing network connection from
>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>> 2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge
>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162 and
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>> has been established.
>>
>> 2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>> SIBBusSupervisor-tpnocp11v-bus stopped
>>
>> When it is KO : "Unknown"
>>
>> 2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector
>> - Establishing network connection from
>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>> 2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge
>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164 and
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due to a
>> remote error: java.net.SocketException: Connection reset
>> 2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge
>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped
>>
>>
>> Here is the other side corresponding activemq trace
>>
>> activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO
>> TransportConnection            - Created Duplex Bridge back to
>> SIBBusModule-NOCP5-tpnocp08s-bus
>>
>> activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO
>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to
>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>
>> activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO
>> TransportConnection            - Created Duplex Bridge back to
>> SIBBusModule-NOCP5-tpnocp08s-bus
>> activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO
>> DemandForwardingBridge         - Network connection between
>> vm://SIBBusSupervisor-tpnocp11v-bus#1770 and
>> tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>> established.
>>
>> activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO
>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to
>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>
>> activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO
>> TransportConnection            - Created Duplex Bridge back to
>> SIBBusModule-NOCP5-tpnocp08s-bus
>> activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO
>> DemandForwardingBridge         - Network connection between
>> vm://SIBBusSupervisor-tpnocp11v-bus#1806 and
>> tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>> established.
>>
>> activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO
>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to
>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>
>>
>> Eric-AWL
>>
>
> --
> View this message in context: http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084235.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>



--
http://blog.garytully.com

Open Source Integration
http://fusesource.com
Reply | Threaded
Open this post in threaded view
|

Re: CountDownLatch problem . MultiCast Discovery and refusal of connection on DUPLEX connection (Fuse 5.3.0.5) .

Eric-AWL
I wonder if it could not have some undesirable effects on both side of the duplex connection ....

perhaps we should test the started AtomicBoolean, in the start() method after the corresponding "await" and shouldn't execute the end of the start method ?

            if (configuration.isDuplex() && duplexInitiatingConnection == null) {
                // initiator side of duplex network
                remoteBrokerNameKnownLatch.await();
            }

HERE ??? (if started.get()) { ???

            try {
                triggerRemoteStartBridge();
            } catch (IOException e) {
                LOG.warn("Caught exception from remote start", e);
            }
            NetworkBridgeListener l = this.networkBridgeListener;
            if (l != null) {
                l.onStart(this);
            }

It's the first big problem I have with ActiveMQ complex configuration, it happens when network is faulty (that happens not very often), and I don't know ActiveMQ source code very well ....

Who could help me to identify potential effects of this change, before I try to modify it ? (I can't do that on my production system without some tests and expert validation)

Eric-AWL

Gary Tully wrote
that seems reasonable. want to submit a patch against trunk?

On 6 July 2010 12:10, Eric-AWL <eric.vincent@atosorigin.com> wrote:
>
> What could happen if we add
>
>         if (configuration.isDuplex() && duplexInitiatingConnection == null)
> {
>                // initiator side of duplex network
>                remoteBrokerNameKnownLatch.countDown();
>            }
>
> into the stop() method of DemandForwardingBridgeSupport class ?
>
> Eric-AWL
>
>
> Eric-AWL wrote:
>>
>> Hi
>>
>> I'm sure that I identified a Latch problem in Multicast Network Discovery
>> mechanism on Duplex connection
>>
>> The multicast notifier thread is blocked. here the trace
>>
>> "Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]"
>> daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition
>> [0x0000000044c26000..0x0000000044c26b90]
>>    java.lang.Thread.State: WAITING (parking)
>>       at sun.misc.Unsafe.park(Native Method)
>>       - parking to wait for  <0x00002aaab3dd66f0> (a
>> java.util.concurrent.CountDownLatch$Sync)
>>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>>       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
>>       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
>>       at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>>       at
>> org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
>>       at
>> org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
>>       at
>> org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
>>       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>> The problem appears when the network is quickly and alternatively on/off
>> between the two components.
>> The bridge is created in one direction, but the answer can not be
>> received.
>>
>> The thread is blocked on the CountDownLatch. Even if multicast frames are
>> received, the component can not establish a new network connection.
>>
>> Here are an corresponding activemq trace
>>
>> When it is OK :
>> 2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector
>> - Establishing network connection from
>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>> 2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge
>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160 and
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>> has been established.
>>
>> 2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>> SIBBusSupervisor-tpnocp11v-bus stopped
>>
>> 2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector
>> - Establishing network connection from
>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>> 2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge
>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162 and
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>> has been established.
>>
>> 2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>> SIBBusSupervisor-tpnocp11v-bus stopped
>>
>> When it is KO : "Unknown"
>>
>> 2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector
>> - Establishing network connection from
>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>> 2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge
>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164 and
>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due to a
>> remote error: java.net.SocketException: Connection reset
>> 2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge
>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped
>>
>>
>> Here is the other side corresponding activemq trace
>>
>> activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO
>> TransportConnection            - Created Duplex Bridge back to
>> SIBBusModule-NOCP5-tpnocp08s-bus
>>
>> activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO
>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to
>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>
>> activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO
>> TransportConnection            - Created Duplex Bridge back to
>> SIBBusModule-NOCP5-tpnocp08s-bus
>> activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO
>> DemandForwardingBridge         - Network connection between
>> vm://SIBBusSupervisor-tpnocp11v-bus#1770 and
>> tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>> established.
>>
>> activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO
>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to
>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>
>> activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO
>> TransportConnection            - Created Duplex Bridge back to
>> SIBBusModule-NOCP5-tpnocp08s-bus
>> activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO
>> DemandForwardingBridge         - Network connection between
>> vm://SIBBusSupervisor-tpnocp11v-bus#1806 and
>> tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>> established.
>>
>> activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO
>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to
>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>
>>
>> Eric-AWL
>>
>
> --
> View this message in context: http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084235.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>



--
http://blog.garytully.com

Open Source Integration
http://fusesource.com
Reply | Threaded
Open this post in threaded view
|

Re: CountDownLatch problem . MultiCast Discovery and refusal of connection on DUPLEX connection (Fuse 5.3.0.5) .

gtully
Hi, as you can see, this is a complicated area of the code. The best
approach is to try and produce a test case for your scenario. Take a
look at the test: BrokerQueueNetworkWithDisconnectTest in
activemq-core. This can simulate network failures and can use
multicast (bridgeAllBrokers). Getting a reproducible test case is the
best way to validate your changes and protect them into the future.

The only other alternative is to keep adding your suggestions to the
jira issue (https://issues.apache.org/activemq/browse/AMQ-2774) and
with a bit of luck I (or some one else) will have a change to look at
it before 5.4 .


On 6 July 2010 12:37, Eric-AWL <[hidden email]> wrote:

>
> I wonder if it could not have some undesirable effects on both side of the
> duplex connection ....
>
> perhaps we should test the started AtomicBoolean, in the start() method
> after the corresponding "await" and shouldn't execute the end of the start
> method ?
>
>            if (configuration.isDuplex() && duplexInitiatingConnection ==
> null) {
>                // initiator side of duplex network
>                remoteBrokerNameKnownLatch.await();
>            }
>
> HERE ??? (if started.get()) { ???
>
>            try {
>                triggerRemoteStartBridge();
>            } catch (IOException e) {
>                LOG.warn("Caught exception from remote start", e);
>            }
>            NetworkBridgeListener l = this.networkBridgeListener;
>            if (l != null) {
>                l.onStart(this);
>            }
>
> It's the first big problem I have with ActiveMQ complex configuration, it
> happens when network is faulty (that happens not very often), and I don't
> know ActiveMQ source code very well ....
>
> Who could help me to identify potential effects of this change, before I try
> to modify it ? (I can't do that on my production system without some tests
> and expert validation)
>
> Eric-AWL
>
>
> Gary Tully wrote:
>>
>> that seems reasonable. want to submit a patch against trunk?
>>
>> On 6 July 2010 12:10, Eric-AWL <[hidden email]> wrote:
>>>
>>> What could happen if we add
>>>
>>>         if (configuration.isDuplex() && duplexInitiatingConnection ==
>>> null)
>>> {
>>>                // initiator side of duplex network
>>>                remoteBrokerNameKnownLatch.countDown();
>>>            }
>>>
>>> into the stop() method of DemandForwardingBridgeSupport class ?
>>>
>>> Eric-AWL
>>>
>>>
>>> Eric-AWL wrote:
>>>>
>>>> Hi
>>>>
>>>> I'm sure that I identified a Latch problem in Multicast Network
>>>> Discovery
>>>> mechanism on Duplex connection
>>>>
>>>> The multicast notifier thread is blocked. here the trace
>>>>
>>>> "Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]"
>>>> daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition
>>>> [0x0000000044c26000..0x0000000044c26b90]
>>>>    java.lang.Thread.State: WAITING (parking)
>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>       - parking to wait for  <0x00002aaab3dd66f0> (a
>>>> java.util.concurrent.CountDownLatch$Sync)
>>>>       at
>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
>>>>       at
>>>> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>>>>       at
>>>> org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
>>>>       at
>>>> org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
>>>>       at
>>>> org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
>>>>       at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>       at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>       at java.lang.Thread.run(Thread.java:619)
>>>>
>>>> The problem appears when the network is quickly and alternatively on/off
>>>> between the two components.
>>>> The bridge is created in one direction, but the answer can not be
>>>> received.
>>>>
>>>> The thread is blocked on the CountDownLatch. Even if multicast frames
>>>> are
>>>> received, the component can not establish a new network connection.
>>>>
>>>> Here are an corresponding activemq trace
>>>>
>>>> When it is OK :
>>>> 2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>> has been established.
>>>>
>>>> 2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>
>>>> 2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>> has been established.
>>>>
>>>> 2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>
>>>> When it is KO : "Unknown"
>>>>
>>>> 2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due to
>>>> a
>>>> remote error: java.net.SocketException: Connection reset
>>>> 2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped
>>>>
>>>>
>>>> Here is the other side corresponding activemq trace
>>>>
>>>> activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>
>>>> activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>> activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>> activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO
>>>> DemandForwardingBridge         - Network connection between
>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1770 and
>>>> tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>> established.
>>>>
>>>> activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>> activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>> activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO
>>>> DemandForwardingBridge         - Network connection between
>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1806 and
>>>> tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>> established.
>>>>
>>>> activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>>
>>>> Eric-AWL
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084235.html
>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> http://blog.garytully.com
>>
>> Open Source Integration
>> http://fusesource.com
>>
>>
>
> --
> View this message in context: http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084410.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>



--
http://blog.garytully.com

Open Source Integration
http://fusesource.com
Reply | Threaded
Open this post in threaded view
|

Re: CountDownLatch problem . MultiCast Discovery and refusal of connection on DUPLEX connection (Fuse 5.3.0.5) .

Eric-AWL
We put a no-Duplex Configuration instead of a Duplex Configuration and it seemed to work better.... But
today during a network problem (alternatively on/off) our process doesn't resist ....

We have
- a thread dump which shows 85 StartLocalBridge Threads waiting for the same latch into the DemandForwardingBridgeSupport.StartLocalBridge method :

 protected void startLocalBridge() throws Exception {
        if (localBridgeStarted.compareAndSet(false, true)) {
            synchronized (this) {
                if (LOG.isTraceEnabled()) {
                    LOG.trace(configuration.getBrokerName() + " starting local Bridge, localBroker=" + localBroker);
                }
                remoteBrokerNameKnownLatch.await();
                ...
}

- 960 CLOSE_WAIT
- a file descriptor limit

Will the transport.closeAsync=false flag be helpful here ?

Eric-AWL


Gary Tully wrote
Hi, as you can see, this is a complicated area of the code. The best
approach is to try and produce a test case for your scenario. Take a
look at the test: BrokerQueueNetworkWithDisconnectTest in
activemq-core. This can simulate network failures and can use
multicast (bridgeAllBrokers). Getting a reproducible test case is the
best way to validate your changes and protect them into the future.

The only other alternative is to keep adding your suggestions to the
jira issue (https://issues.apache.org/activemq/browse/AMQ-2774) and
with a bit of luck I (or some one else) will have a change to look at
it before 5.4 .


On 6 July 2010 12:37, Eric-AWL <eric.vincent@atosorigin.com> wrote:
>
> I wonder if it could not have some undesirable effects on both side of the
> duplex connection ....
>
> perhaps we should test the started AtomicBoolean, in the start() method
> after the corresponding "await" and shouldn't execute the end of the start
> method ?
>
>            if (configuration.isDuplex() && duplexInitiatingConnection ==
> null) {
>                // initiator side of duplex network
>                remoteBrokerNameKnownLatch.await();
>            }
>
> HERE ??? (if started.get()) { ???
>
>            try {
>                triggerRemoteStartBridge();
>            } catch (IOException e) {
>                LOG.warn("Caught exception from remote start", e);
>            }
>            NetworkBridgeListener l = this.networkBridgeListener;
>            if (l != null) {
>                l.onStart(this);
>            }
>
> It's the first big problem I have with ActiveMQ complex configuration, it
> happens when network is faulty (that happens not very often), and I don't
> know ActiveMQ source code very well ....
>
> Who could help me to identify potential effects of this change, before I try
> to modify it ? (I can't do that on my production system without some tests
> and expert validation)
>
> Eric-AWL
>
>
> Gary Tully wrote:
>>
>> that seems reasonable. want to submit a patch against trunk?
>>
>> On 6 July 2010 12:10, Eric-AWL <eric.vincent@atosorigin.com> wrote:
>>>
>>> What could happen if we add
>>>
>>>         if (configuration.isDuplex() && duplexInitiatingConnection ==
>>> null)
>>> {
>>>                // initiator side of duplex network
>>>                remoteBrokerNameKnownLatch.countDown();
>>>            }
>>>
>>> into the stop() method of DemandForwardingBridgeSupport class ?
>>>
>>> Eric-AWL
>>>
>>>
>>> Eric-AWL wrote:
>>>>
>>>> Hi
>>>>
>>>> I'm sure that I identified a Latch problem in Multicast Network
>>>> Discovery
>>>> mechanism on Duplex connection
>>>>
>>>> The multicast notifier thread is blocked. here the trace
>>>>
>>>> "Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]"
>>>> daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition
>>>> [0x0000000044c26000..0x0000000044c26b90]
>>>>    java.lang.Thread.State: WAITING (parking)
>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>       - parking to wait for  <0x00002aaab3dd66f0> (a
>>>> java.util.concurrent.CountDownLatch$Sync)
>>>>       at
>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
>>>>       at
>>>> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>>>>       at
>>>> org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
>>>>       at
>>>> org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
>>>>       at
>>>> org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
>>>>       at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>       at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>       at java.lang.Thread.run(Thread.java:619)
>>>>
>>>> The problem appears when the network is quickly and alternatively on/off
>>>> between the two components.
>>>> The bridge is created in one direction, but the answer can not be
>>>> received.
>>>>
>>>> The thread is blocked on the CountDownLatch. Even if multicast frames
>>>> are
>>>> received, the component can not establish a new network connection.
>>>>
>>>> Here are an corresponding activemq trace
>>>>
>>>> When it is OK :
>>>> 2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>> has been established.
>>>>
>>>> 2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>
>>>> 2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>> has been established.
>>>>
>>>> 2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>
>>>> When it is KO : "Unknown"
>>>>
>>>> 2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due to
>>>> a
>>>> remote error: java.net.SocketException: Connection reset
>>>> 2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped
>>>>
>>>>
>>>> Here is the other side corresponding activemq trace
>>>>
>>>> activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>
>>>> activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>> activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>> activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO
>>>> DemandForwardingBridge         - Network connection between
>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1770 and
>>>> tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>> established.
>>>>
>>>> activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>> activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>> activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO
>>>> DemandForwardingBridge         - Network connection between
>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1806 and
>>>> tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>> established.
>>>>
>>>> activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>>
>>>> Eric-AWL
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084235.html
>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> http://blog.garytully.com
>>
>> Open Source Integration
>> http://fusesource.com
>>
>>
>
> --
> View this message in context: http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084410.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>



--
http://blog.garytully.com

Open Source Integration
http://fusesource.com
Reply | Threaded
Open this post in threaded view
|

Re: CountDownLatch problem . MultiCast Discovery and refusal of connection on DUPLEX connection (Fuse 5.3.0.5) .

Eric-AWL
I added a fix in the Jira AMQ-2774 thread.

Eric-AWL

Eric-AWL wrote
We put a no-Duplex Configuration instead of a Duplex Configuration and it seemed to work better.... But
today during a network problem (alternatively on/off) our process doesn't resist ....

We have
- a thread dump which shows 85 StartLocalBridge Threads waiting for the same latch into the DemandForwardingBridgeSupport.StartLocalBridge method :

 protected void startLocalBridge() throws Exception {
        if (localBridgeStarted.compareAndSet(false, true)) {
            synchronized (this) {
                if (LOG.isTraceEnabled()) {
                    LOG.trace(configuration.getBrokerName() + " starting local Bridge, localBroker=" + localBroker);
                }
                remoteBrokerNameKnownLatch.await();
                ...
}

- 960 CLOSE_WAIT
- a file descriptor limit

Will the transport.closeAsync=false flag be helpful here ?

Eric-AWL


Gary Tully wrote
Hi, as you can see, this is a complicated area of the code. The best
approach is to try and produce a test case for your scenario. Take a
look at the test: BrokerQueueNetworkWithDisconnectTest in
activemq-core. This can simulate network failures and can use
multicast (bridgeAllBrokers). Getting a reproducible test case is the
best way to validate your changes and protect them into the future.

The only other alternative is to keep adding your suggestions to the
jira issue (https://issues.apache.org/activemq/browse/AMQ-2774) and
with a bit of luck I (or some one else) will have a change to look at
it before 5.4 .


On 6 July 2010 12:37, Eric-AWL <eric.vincent@atosorigin.com> wrote:
>
> I wonder if it could not have some undesirable effects on both side of the
> duplex connection ....
>
> perhaps we should test the started AtomicBoolean, in the start() method
> after the corresponding "await" and shouldn't execute the end of the start
> method ?
>
>            if (configuration.isDuplex() && duplexInitiatingConnection ==
> null) {
>                // initiator side of duplex network
>                remoteBrokerNameKnownLatch.await();
>            }
>
> HERE ??? (if started.get()) { ???
>
>            try {
>                triggerRemoteStartBridge();
>            } catch (IOException e) {
>                LOG.warn("Caught exception from remote start", e);
>            }
>            NetworkBridgeListener l = this.networkBridgeListener;
>            if (l != null) {
>                l.onStart(this);
>            }
>
> It's the first big problem I have with ActiveMQ complex configuration, it
> happens when network is faulty (that happens not very often), and I don't
> know ActiveMQ source code very well ....
>
> Who could help me to identify potential effects of this change, before I try
> to modify it ? (I can't do that on my production system without some tests
> and expert validation)
>
> Eric-AWL
>
>
> Gary Tully wrote:
>>
>> that seems reasonable. want to submit a patch against trunk?
>>
>> On 6 July 2010 12:10, Eric-AWL <eric.vincent@atosorigin.com> wrote:
>>>
>>> What could happen if we add
>>>
>>>         if (configuration.isDuplex() && duplexInitiatingConnection ==
>>> null)
>>> {
>>>                // initiator side of duplex network
>>>                remoteBrokerNameKnownLatch.countDown();
>>>            }
>>>
>>> into the stop() method of DemandForwardingBridgeSupport class ?
>>>
>>> Eric-AWL
>>>
>>>
>>> Eric-AWL wrote:
>>>>
>>>> Hi
>>>>
>>>> I'm sure that I identified a Latch problem in Multicast Network
>>>> Discovery
>>>> mechanism on Duplex connection
>>>>
>>>> The multicast notifier thread is blocked. here the trace
>>>>
>>>> "Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]"
>>>> daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition
>>>> [0x0000000044c26000..0x0000000044c26b90]
>>>>    java.lang.Thread.State: WAITING (parking)
>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>       - parking to wait for  <0x00002aaab3dd66f0> (a
>>>> java.util.concurrent.CountDownLatch$Sync)
>>>>       at
>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
>>>>       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
>>>>       at
>>>> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>>>>       at
>>>> org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
>>>>       at
>>>> org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
>>>>       at
>>>> org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
>>>>       at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>       at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>       at java.lang.Thread.run(Thread.java:619)
>>>>
>>>> The problem appears when the network is quickly and alternatively on/off
>>>> between the two components.
>>>> The bridge is created in one direction, but the answer can not be
>>>> received.
>>>>
>>>> The thread is blocked on the CountDownLatch. Even if multicast frames
>>>> are
>>>> received, the component can not establish a new network connection.
>>>>
>>>> Here are an corresponding activemq trace
>>>>
>>>> When it is OK :
>>>> 2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>> has been established.
>>>>
>>>> 2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>
>>>> 2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>> has been established.
>>>>
>>>> 2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>
>>>> When it is KO : "Unknown"
>>>>
>>>> 2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO
>>>>  DiscoveryNetworkConnector
>>>> - Establishing network connection from
>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>> 2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge
>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164
>>>> and
>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due to
>>>> a
>>>> remote error: java.net.SocketException: Connection reset
>>>> 2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge
>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped
>>>>
>>>>
>>>> Here is the other side corresponding activemq trace
>>>>
>>>> activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>
>>>> activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>> activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>> activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO
>>>> DemandForwardingBridge         - Network connection between
>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1770 and
>>>> tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>> established.
>>>>
>>>> activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>> activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO
>>>> TransportConnection            - Created Duplex Bridge back to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>> activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO
>>>> DemandForwardingBridge         - Network connection between
>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1806 and
>>>> tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>> established.
>>>>
>>>> activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO
>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>> to
>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>
>>>>
>>>> Eric-AWL
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084235.html
>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> http://blog.garytully.com
>>
>> Open Source Integration
>> http://fusesource.com
>>
>>
>
> --
> View this message in context: http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084410.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>



--
http://blog.garytully.com

Open Source Integration
http://fusesource.com