Quantcast

Calling end on TransactionContext hangs during failover when using master slave

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Calling end on TransactionContext hangs during failover when using master slave

JordanC
I have a clustered J2EE application that starts up a broker on each node using failover protocol and a shared data directory.

Node 1: Starts up a broker and creates a transport connector at tcp://<host>:61616. Consumer and producer connect via a broker url of failover:(tcp://<host>:61616,tcp://<host>:61617)

Node 2: Starts up a broker and creates a transport connector at tcp://<host>:61617. Consumer and producer connect via a broker url of failover:(tcp://<host>:61616,tcp://<host>:61617)

The simple use case that is failing for me is the following:
1) Start up node 1 first so it is the master. Start up node 2.
2) Send 4 messages on node 1 with a delay so that the node can be killed before the messages finish processing. 2 messages are being processed on node 1 and 2 on node 2.
3) Forcefully kill node 1 while messages are being processed.

The two threads on node 2 that were consuming the messages were both hanging after calling TransactionContext#end. They would go into ResponseCorrelator#request and send a TransactionInfo command. The TransactionInfo command is consumed and creates a response command which is sent correctly. The problem seems to be that this response command is never read in TcpTransport#doRun. Because of this, ResponseCorrelator#request blocks when trying to return the response.getResult().

The transactions for the 2 messages being processed on node 2 block so they are never committed.

If I modify my test to only send 2 messages so that each node is processing 1 message, everything runs without any problems. The second node is able to end the transaction successfully by going through the exact same code path except that the response command is consumed. After that it processes the message that was being consumed by node 1 correctly as well. Once I send 4 or more messages, this issue will occur.

Does anyone have any insight as to what might be happening? I haven't been able to figure out why the response command doesn't get consumed in the unsuccessful case. There are no exceptions either and the response command seems to be sent successfully.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Calling end on TransactionContext hangs during failover when using master slave

JordanC
I'm able to reproduce this issue in a much simpler way. Instead of starting up brokers from within my application, I am using a clustered activemq setup with 2 instances as follows. Both instances share the same data directory so that only one broker is active at a time.

broker-1 has the following transport connector in activemq.xml:
<transportConnectors>
<transportConnector name="openwire" discoveryUri="multicast://default" updateClusterClients="true" uri="tcp://0.0.0.0:61616"/>
</transportConnectors>

broker-2 has the following transport connector in activemq.xml:
<transportConnectors>
<transportConnector name="openwire" discoveryUri="multicast://default" updateClusterClients="true" uri="tcp://0.0.0.0:61617"/>
</transportConnectors>

My J2EE application creates a connection with a broker url of failover:(tcp://<host>:61616,tcp://<host>:61617)

The use case that fails is as follows:
1) I start broker-1 then broker-2. broker-1 is the master broker and holds the lock on the data directory. I start my J2EE application and it creates a connection to broker-1. I send 2 messages using that connection.
2) While the messages are being processed, I shut down broker-1.
3) broker-2 becomes the new master broker. Through failover transport, my connection is now connected to broker-2 at port 61617.
4) The messages continue to be consumed after the connection switched over to broker-2 but when the transactions are being committed, TransactionContext#end hangs indefinitely.

Debugging this, I found that end leads to ResponseCorrelator#request which send a TransactionInfo command. The TransactionInfo command is consumed and creates a response command which is sent correctly. The problem seems to be that this response command is never read in TcpTransport#doRun. Because of this, ResponseCorrelator#request blocks when trying to return the response.getResult().

Does anyone have any insight into this? This issue is blocking me from being able to use master slave with shared directory so that I can ensure high availability and immediate failover.
Loading...