Consumer hanging after a producer fails for being inactive

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Consumer hanging after a producer fails for being inactive

errobin
We are testing ActiveMQ 5.9 with Oracle JDK 1.7.

We created a embedded broker with persistent set to false with a single queue.
We gave it two transportConnectors
 openwire with a uri="tcp://0.0.0.0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600"
 vm with a uri="vm://localhost"

We have a producer sending messages (50 messages per second) to the queue using the openwire transport over a vpn connection.
The producer is using the following url=failover:(tcp://192.168.100.63:61616)?jms.alwaysSyncSend=true

We have a single consumer using the vm transport with a MessageListener that just logs that we received a message.
The consumer is using the following url=vm://localhost

We then have a script that cycles the vpn connection by having the vpn up for a minute then down for a minute

After about 10-15 minutes we will see in the broker logs

Transport - Transport Connection to: tcp://192.168.100.22:36705 failed: org.apache.activemq.transport.InactivityIOException: Channel was inactive for too (>30000) long: tcp://192.168.100.22:36705

Once we see that error in the logs the consumer stops receiving messages and the queue starts to grow when the producer reconnects and starts sending data again.

The consumer code has a Catch Throwable around all the code to try to catch any exceptions. I have also removed my own simple message consumer and tried a Camel JMS Consumer and it hangs the same as my message consumer.

I am using JConsole to watch the queue grow and see the consumer count are at 1 and producer count changes from 0 to 1 depending on if the vpn is up or not.

So is there anything else I can look at to see why the consumer is not receiving messages?

Thanks,
Eric

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

artnaseef
That's a curious problem.  Have you tried with 5.10?  I know there has been some effort around the area of the inactivity timeouts on connections.

Also, if you could produce a minimal test program, that would certainly help.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

errobin
I have been able to reproduce the issue with 5.10.
I am currently trying to walk through the activemq 5.9 code to figure out whats going on.
I am seeing
DEBUG Queue - Incoming toPageIn: 200, Inflight: 0, pagedInMessages.size 0 in my log files
In the org.apache.activemq.broker.region.Queue at line 1876 its doing
while (messages.hasNext() && count < toPageIn) but its never going into the while loop.

So the messages.hasNext() seems to call a class
org.apache.activemq.broker.region.cursors.AbstractStoreCursor that calls a fillBatch, but the variable storeHasMessages is always false so it does nothing and the hasNext always returns false.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

artnaseef
Sounds like this is easy to reproduce.  Can you write a small (well, as small as possible) junit that reproduces the problem?

It may be necessary to get creative in forcing the network connection failure.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

errobin
I will work on something to try to force the network connection failure without using openvpn to be my networking failure trigger.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

errobin
So I was unable to produce a JUnit test that causes this issue to occur.

My first attempt was to use the SocketProxy in the test package of activemq.
I had the broker,the consumer, and the producer all running in the same JVM.
The consumer was using the VM transport and the producer was connecting to the SocketProxy which was connecting to the brokers openwire transport. I had a task running every 120 seconds that would rotate calling pause and goOn of the SocketProxy. This did not work so I tried close and reopen on the SocketProxy and that as well did not work.

My second attempt was to use Runtime.getRuntime().exec("ifconfig eth1 up"); with the same setup from my first attempt, but without the SocketProxy. This also failed to reproduce the issue.

My third attempt was to remove the producer from the same machine and put it on another machine on the local network, but this also was unable to reproduce the issue.

My final attempt was to use the same configuration as the 3rd attempt but put the producer on a machine that's on a different network that requires vpn access to my workstation. I was able to reproduce the consumers hanging in this configuration. So I believe it might have to do something with latency, but at this point I am not sure.

So I am going to upload my test files if anyone wants to try to reproduce it.

Hub.java

Sender.java
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

errobin
It looks like the issue with the consumer hanging has something to do with duplicate messages after a producer comes back from a failover.

I see the Producer reconnect after the network is restored.
The broker receives a ActiveMQMapMessage.
The ProducerBrokerExchange setting the last stored sequence id to -1.
The AbstractStoreCursor store saying it received a duplicate message.

After that the consumer stops receiving data the Queue keeps going up and
the logs show the toPageIn going up for each message received.

Thanks,
Eric
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

errobin
We are currently testing a work around for this issue.
Since our broker has Persistent set to false and our producers are just using default settings which is causing them to send their messages with Persistent = true. We are changing the producer to set Persistent = false which seems to fix the issue so far. By setting Persistent = false in the producer it is taking a different path through ActiveMQ and using the PendingMessageCursor instead of the QueueStorePrefetch. If this solves the issue we will add a BrokerFilter to set persistent = false on all messages in the send method.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

artnaseef
Hmm, check the producer connections to the broker after the VPN restart using netstat.

When the consumers appear to hang, do you see evidence on the broker that the producer messages are actually being received in the broker?  For example, the QueueSize for the queue -- is it increasing over time?

What does jstack show the consumer threads doing?

Note that with certain configurations on the producers, the ActiveMQ client library may use asynchronous sends to push the messages across the transport to the broker, so the producers may appear to be actively sending even though the messages aren't making it to the broker.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

errobin
artnaseef wrote
Hmm, check the producer connections to the broker after the VPN restart using netstat.

When the consumers appear to hang, do you see evidence on the broker that the producer messages are actually being received in the broker?  For example, the QueueSize for the queue -- is it increasing over time?

What does jstack show the consumer threads doing?

Note that with certain configurations on the producers, the ActiveMQ client library may use asynchronous sends to push the messages across the transport to the broker, so the producers may appear to be actively sending even though the messages aren't making it to the broker.
The producer connection is just fine. When it sends a message it receives a response back saying it received the message. We have trace enabled on the transport.

The consumers are blocked waiting for messages. They are blocked at the same spot as if their was no messages on the queue.

When Activemq is in this state I can see the queue size growing using JConsole. I can see the Queue Cursor size growing by the log statements as well as setting breakpoints in the activemq code, but I believe there is a flag set that says there is no messages in the cursor and it does not send the message on to the consumers.

I have been running for a few days now with the producer sending non persistent messages and it has been running fine.

I believe there is a bug in activemq on how it handles a persistent message being sent to a broker that has persistents disabled and it receives a duplicate message as soon as a failover transport comes back.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

Tim Bain
5.11.0 had a couple fixes for bugs related to duplicate messages being
delivered after failovers; have you tested with 5.11.1 to see if those
fixes address what you're seeing?
On Jan 16, 2015 8:47 AM, "errobin" <[hidden email]> wrote:

> artnaseef wrote
> > Hmm, check the producer connections to the broker after the VPN restart
> > using netstat.
> >
> > When the consumers appear to hang, do you see evidence on the broker that
> > the producer messages are actually being received in the broker?  For
> > example, the QueueSize for the queue -- is it increasing over time?
> >
> > What does jstack show the consumer threads doing?
> >
> > Note that with certain configurations on the producers, the ActiveMQ
> > client library may use asynchronous sends to push the messages across the
> > transport to the broker, so the producers may appear to be actively
> > sending even though the messages aren't making it to the broker.
>
> The producer connection is just fine. When it sends a message it receives a
> response back saying it received the message. We have trace enabled on the
> transport.
>
> The consumers are blocked waiting for messages. They are blocked at the
> same
> spot as if their was no messages on the queue.
>
> When Activemq is in this state I can see the queue size growing using
> JConsole. I can see the Queue Cursor size growing by the log statements as
> well as setting breakpoints in the activemq code, but I believe there is a
> flag set that says there is no messages in the cursor and it does not send
> the message on to the consumers.
>
> I have been running for a few days now with the producer sending non
> persistent messages and it has been running fine.
>
> I believe there is a bug in activemq on how it handles a persistent message
> being sent to a broker that has persistents disabled and it receives a
> duplicate message as soon as a failover transport comes back.
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Consumer-hanging-after-a-producer-fails-for-being-inactive-tp4689775p4690049.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Consumer hanging after a producer fails for being inactive

Aromin10
This post has NOT been accepted by the mailing list yet.
In reply to this post by errobin
Great information on this sdk and vpn. Couple of months ago, I availed services of the Expressvpn Netflix which are offering good speed and unlimited bandwidth along with uninterrupted services. Happy to get this vpn service.
Loading...