Transport failed, please helpT_T

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Transport failed, please helpT_T

Cadmean
I have built a broker network with 4 brokers and 5000 clinets. ( I have changed the max connection of each broker to 2500 in activemq.xml). Every broker using failover forever to build the connection.
The problem is, after a uncertain time, some clients (about 20 clients which OS incluing AIX and Suse Linux) start to show the following logs and keep trying to reconnect the broker. Also, I can see those clients keep getting online and offline when listening to the system topic TOPIC://ActiveMQ.Advisory.Connetion.

[2015-06-10 09:05:51,412 [WARN][ActiveMQ Transport: tcp://83.28.33.224:61616@47664]--Transport (tcp://83.28.33.224:61616@47664) failed, reason: , attempting to automatically reconnect
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:386)
at org.apache.activemq.openwire.OpenWireFormat.unmarshal (OpenWireFOrmat.java:258)
at org.apache.activemq.transport.tcp.TcpTransport.readCommand (TcpTransport.java:221)
at org.apache.activemq.transport.tcp.TcpTransport.doRun (TcpTransport.java:213)
at org.apache.activemq.transport.tcp.TcpTransport.run (TcpTransport.java:196)
at java.lang.Thread.run(Thread.java:735)

Enviroment:
Suse Linux 11 sp2,JDK 1.7
the jvm memory is set to 2G
Reply | Threaded
Open this post in threaded view
|

Re: Transport failed, please helpT_T

christopher.l.shannon
I don't think there is enough information here to really solve the issue.
The EOFException just means that the clients have been disconnected but it
doesn't say why.  It could be a number of reasons including network issues
causing the disconnect or exceptions on message receive, etc.  Can you try
turning up the logging to debug to see if any more useful information shows
up in the logs?

On Fri, Jul 3, 2015 at 1:56 AM, Cadmean <[hidden email]> wrote:

> I have built a broker network with 4 brokers and 5000 clinets. ( I have
> changed the max connection of each broker to 2500 in activemq.xml). Every
> broker using failover forever to build the connection.
> The problem is, after a uncertain time, some clients (about 20 clients
> which
> OS incluing AIX and Suse Linux) start to show the following logs and keep
> trying to reconnect the broker. Also, I can see those clients keep getting
> online and offline when listening to the system topic
> TOPIC://ActiveMQ.Advisory.Connetion.
>
> [2015-06-10 09:05:51,412 [WARN][ActiveMQ Transport:
> tcp://83.28.33.224:61616@47664]--Transport (tcp://83.28.33.224:61616
> @47664)
> failed, reason: , attempting to automatically reconnect
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:386)
> at org.apache.activemq.openwire.OpenWireFormat.unmarshal
> (OpenWireFOrmat.java:258)
> at org.apache.activemq.transport.tcp.TcpTransport.readCommand
> (TcpTransport.java:221)
> at org.apache.activemq.transport.tcp.TcpTransport.doRun
> (TcpTransport.java:213)
> at org.apache.activemq.transport.tcp.TcpTransport.run
> (TcpTransport.java:196)
> at java.lang.Thread.run(Thread.java:735)
>
> Enviroment:
> Suse Linux 11 sp2,JDK 1.7
> the jvm memory is set to 2G
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Transport failed, please helpT_T

artnaseef
First thing I would look at here is diagnostics from the network level itself.  WireShark or tcpdump can be used to get a better understanding of why the connections are dropping.

If the network between the client and brokers is unreliable, this will happen a lot and it will significantly interfere with the messaging.

Also check the broker log files for any indications of causes of the dropped connections.

With all of that said, with the failover transport, these failures should be short-lived and all of the applications should continue to operate normally.  The impact of greatest concern coming to mind is the increased probability of message redeliveries, but that is a normal occurrence with JMS (in other words, applications need to handle this possibility with or without these dropped connections).
Reply | Threaded
Open this post in threaded view
|

Re: Transport failed, please helpT_T

Cadmean
1. Since all the clients are in the INTRANET, I don't think the network could be a problem, but I will check it anyway.

2. Right now, I haven't started producing messages. In this case, all the clients are just consumers without receving any messages. So I think the message redeliveries can not be the cause of the problem.

The next thing I will try to do is opening debug logging to see if there is any helpful information.

Thanks a lot. :D

artnaseef wrote
First thing I would look at here is diagnostics from the network level itself.  WireShark or tcpdump can be used to get a better understanding of why the connections are dropping.

If the network between the client and brokers is unreliable, this will happen a lot and it will significantly interfere with the messaging.

Also check the broker log files for any indications of causes of the dropped connections.

With all of that said, with the failover transport, these failures should be short-lived and all of the applications should continue to operate normally.  The impact of greatest concern coming to mind is the increased probability of message redeliveries, but that is a normal occurrence with JMS (in other words, applications need to handle this possibility with or without these dropped connections).
Reply | Threaded
Open this post in threaded view
|

Re: Transport failed, please helpT_T

Tim Bain
Assuming that intranet == "stable network without any firewalls,
misconfigurations, or hiccups" sounds like a huge mistake to me, and even
more so when you've posted a question indicating that your logs are full of
messages indicating that you have connection problems.  That's not to say
that there can't be bugs in the ActiveMQ code that could cause this
behavior, but it's far from the only possible cause for what you're
seeing.  And I second what Art said: if your security department will allow
it, you want to use a network sniffer such as WireShark or tcpdump (but
WireShark is generally preferred) to figure out what's going on at a
network level; trying to piece it together from only debug logs is likely
to be difficult.

Also, to clarify: are you saying that for those 20ish clients who start
experiencing connection problems, they experience those connection problems
continually?  Or do they recover after a few failures, only to have other
clients fail later?

One last thing: the version of ActiveMQ you're using is ALWAYS relevant
information, and should be included in any post to this mailing list asking
for help.  How are we supposed to help figure out what's going on (or if
it's a known bug that's been fixed in a later version) if you don't tell us
what version you're using?  For example,
https://issues.apache.org/jira/browse/AMQ-5241 is fixed in 5.10.1 and
5.11.0, but I have no idea whether you're running a version that has that
fix.

Tim

On Tue, Jul 7, 2015 at 6:32 PM, Cadmean <[hidden email]> wrote:

> 1. Since all the clients are in the INTRANET, I don't think the network
> could
> be a problem, but I will check it anyway.
>
> 2. Right now, I haven't started producing messages. In this case, all the
> clients are just consumers without receving any messages. So I think the
> message redeliveries can not be the cause of the problem.
>
> The next thing I will try to do is opening debug logging to see if there is
> any helpful information.
>
> Thanks a lot. :D
>
>
> artnaseef wrote
> > First thing I would look at here is diagnostics from the network level
> > itself.  WireShark or tcpdump can be used to get a better understanding
> of
> > why the connections are dropping.
> >
> > If the network between the client and brokers is unreliable, this will
> > happen a lot and it will significantly interfere with the messaging.
> >
> > Also check the broker log files for any indications of causes of the
> > dropped connections.
> >
> > With all of that said, with the failover transport, these failures should
> > be short-lived and all of the applications should continue to operate
> > normally.  The impact of greatest concern coming to mind is the increased
> > probability of message redeliveries, but that is a normal occurrence with
> > JMS (in other words, applications need to handle this possibility with or
> > without these dropped connections).
>
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Transport failed, please helpT_T

Cadmean
Thank you very much for your reply. I think it is very helpful.

1. You are right. I should not be that arrogant to say that it cannot be the problem of INTRANET, I will ask the network department for help next week.

2. For now, the 20ish clients experience those connection problems continually. When I chek it today, I found 10 more machines experience the same problems.

3. the Version I use is 5.10.0, sorry for missing that.

Tim Bain wrote
Assuming that intranet == "stable network without any firewalls,
misconfigurations, or hiccups" sounds like a huge mistake to me, and even
more so when you've posted a question indicating that your logs are full of
messages indicating that you have connection problems.  That's not to say
that there can't be bugs in the ActiveMQ code that could cause this
behavior, but it's far from the only possible cause for what you're
seeing.  And I second what Art said: if your security department will allow
it, you want to use a network sniffer such as WireShark or tcpdump (but
WireShark is generally preferred) to figure out what's going on at a
network level; trying to piece it together from only debug logs is likely
to be difficult.

Also, to clarify: are you saying that for those 20ish clients who start
experiencing connection problems, they experience those connection problems
continually?  Or do they recover after a few failures, only to have other
clients fail later?

One last thing: the version of ActiveMQ you're using is ALWAYS relevant
information, and should be included in any post to this mailing list asking
for help.  How are we supposed to help figure out what's going on (or if
it's a known bug that's been fixed in a later version) if you don't tell us
what version you're using?  For example,
https://issues.apache.org/jira/browse/AMQ-5241 is fixed in 5.10.1 and
5.11.0, but I have no idea whether you're running a version that has that
fix.

Tim

On Tue, Jul 7, 2015 at 6:32 PM, Cadmean <[hidden email]> wrote:

> 1. Since all the clients are in the INTRANET, I don't think the network
> could
> be a problem, but I will check it anyway.
>
> 2. Right now, I haven't started producing messages. In this case, all the
> clients are just consumers without receving any messages. So I think the
> message redeliveries can not be the cause of the problem.
>
> The next thing I will try to do is opening debug logging to see if there is
> any helpful information.
>
> Thanks a lot. :D
>
>
> artnaseef wrote
> > First thing I would look at here is diagnostics from the network level
> > itself.  WireShark or tcpdump can be used to get a better understanding
> of
> > why the connections are dropping.
> >
> > If the network between the client and brokers is unreliable, this will
> > happen a lot and it will significantly interfere with the messaging.
> >
> > Also check the broker log files for any indications of causes of the
> > dropped connections.
> >
> > With all of that said, with the failover transport, these failures should
> > be short-lived and all of the applications should continue to operate
> > normally.  The impact of greatest concern coming to mind is the increased
> > probability of message redeliveries, but that is a normal occurrence with
> > JMS (in other words, applications need to handle this possibility with or
> > without these dropped connections).
>
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Transport failed, please helpT_T

Tim Bain
I wouldn't call it arrogance, but it's definitely a bad assumption (and my
experience has been that even at large companies, intranets are generally
less stable and reliable than the Internet as a whole, so assuming that
your networking department can't possibly do something wrong gives them far
too much credit).  Either way, using WireShark to dig into what's going on
at a network level is still your best starting place; let us know what you
find from that and we may be able to help you from there.

Are you seeing any warnings in the broker logs related to closing
connections due to inactivity?  That's one thing (of many) that could
explain EOFExceptions...

Since the version you're using doesn't have the bug fix I referenced, it's
possible that upgrading to 5.10.2 or 5.11.1 would fix this.  Do you have
the ability to try one of those versions in a test environment to see if it
eliminates the problem?

Also, what technology are you using for your client code?  Java?  C++?
Perl?

Tim

On Wed, Jul 8, 2015 at 6:41 PM, Cadmean <[hidden email]> wrote:

> Thank you very much for your reply. I think it is very helpful.
>
> 1. You are right. I should not be that arrogant to say that it cannot be
> the
> problem of INTRANET, I will ask the network department for help next week.
>
> 2. For now, the 20ish clients experience those connection problems
> continually. When I chek it today, I found 10 more machines experience the
> same problems.
>
> 3. the Version I use is 5.10.0, sorry for missing that.
>
>
> Tim Bain wrote
> > Assuming that intranet == "stable network without any firewalls,
> > misconfigurations, or hiccups" sounds like a huge mistake to me, and even
> > more so when you've posted a question indicating that your logs are full
> > of
> > messages indicating that you have connection problems.  That's not to say
> > that there can't be bugs in the ActiveMQ code that could cause this
> > behavior, but it's far from the only possible cause for what you're
> > seeing.  And I second what Art said: if your security department will
> > allow
> > it, you want to use a network sniffer such as WireShark or tcpdump (but
> > WireShark is generally preferred) to figure out what's going on at a
> > network level; trying to piece it together from only debug logs is likely
> > to be difficult.
> >
> > Also, to clarify: are you saying that for those 20ish clients who start
> > experiencing connection problems, they experience those connection
> > problems
> > continually?  Or do they recover after a few failures, only to have other
> > clients fail later?
> >
> > One last thing: the version of ActiveMQ you're using is ALWAYS relevant
> > information, and should be included in any post to this mailing list
> > asking
> > for help.  How are we supposed to help figure out what's going on (or if
> > it's a known bug that's been fixed in a later version) if you don't tell
> > us
> > what version you're using?  For example,
> > https://issues.apache.org/jira/browse/AMQ-5241 is fixed in 5.10.1 and
> > 5.11.0, but I have no idea whether you're running a version that has that
> > fix.
> >
> > Tim
> >
> > On Tue, Jul 7, 2015 at 6:32 PM, Cadmean <
>
> > hzcadmean@
>
> > > wrote:
> >
> >> 1. Since all the clients are in the INTRANET, I don't think the network
> >> could
> >> be a problem, but I will check it anyway.
> >>
> >> 2. Right now, I haven't started producing messages. In this case, all
> the
> >> clients are just consumers without receving any messages. So I think the
> >> message redeliveries can not be the cause of the problem.
> >>
> >> The next thing I will try to do is opening debug logging to see if there
> >> is
> >> any helpful information.
> >>
> >> Thanks a lot. :D
> >>
> >>
> >> artnaseef wrote
> >> > First thing I would look at here is diagnostics from the network level
> >> > itself.  WireShark or tcpdump can be used to get a better
> understanding
> >> of
> >> > why the connections are dropping.
> >> >
> >> > If the network between the client and brokers is unreliable, this will
> >> > happen a lot and it will significantly interfere with the messaging.
> >> >
> >> > Also check the broker log files for any indications of causes of the
> >> > dropped connections.
> >> >
> >> > With all of that said, with the failover transport, these failures
> >> should
> >> > be short-lived and all of the applications should continue to operate
> >> > normally.  The impact of greatest concern coming to mind is the
> >> increased
> >> > probability of message redeliveries, but that is a normal occurrence
> >> with
> >> > JMS (in other words, applications need to handle this possibility with
> >> or
> >> > without these dropped connections).
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
> >> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
> >>
>
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698842.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Transport failed, please helpT_T

Cadmean
The problem has been solved.

After using WireShark, I found both [SYN] and [FIN] package looks good. In this case, I examined the logic of the code for building/closing connection, then I found someone changes the code which cause the problem.

Thanks for your advice.

Tim Bain wrote
I wouldn't call it arrogance, but it's definitely a bad assumption (and my
experience has been that even at large companies, intranets are generally
less stable and reliable than the Internet as a whole, so assuming that
your networking department can't possibly do something wrong gives them far
too much credit).  Either way, using WireShark to dig into what's going on
at a network level is still your best starting place; let us know what you
find from that and we may be able to help you from there.

Are you seeing any warnings in the broker logs related to closing
connections due to inactivity?  That's one thing (of many) that could
explain EOFExceptions...

Since the version you're using doesn't have the bug fix I referenced, it's
possible that upgrading to 5.10.2 or 5.11.1 would fix this.  Do you have
the ability to try one of those versions in a test environment to see if it
eliminates the problem?

Also, what technology are you using for your client code?  Java?  C++?
Perl?

Tim

On Wed, Jul 8, 2015 at 6:41 PM, Cadmean <[hidden email]> wrote:

> Thank you very much for your reply. I think it is very helpful.
>
> 1. You are right. I should not be that arrogant to say that it cannot be
> the
> problem of INTRANET, I will ask the network department for help next week.
>
> 2. For now, the 20ish clients experience those connection problems
> continually. When I chek it today, I found 10 more machines experience the
> same problems.
>
> 3. the Version I use is 5.10.0, sorry for missing that.
>
>
> Tim Bain wrote
> > Assuming that intranet == "stable network without any firewalls,
> > misconfigurations, or hiccups" sounds like a huge mistake to me, and even
> > more so when you've posted a question indicating that your logs are full
> > of
> > messages indicating that you have connection problems.  That's not to say
> > that there can't be bugs in the ActiveMQ code that could cause this
> > behavior, but it's far from the only possible cause for what you're
> > seeing.  And I second what Art said: if your security department will
> > allow
> > it, you want to use a network sniffer such as WireShark or tcpdump (but
> > WireShark is generally preferred) to figure out what's going on at a
> > network level; trying to piece it together from only debug logs is likely
> > to be difficult.
> >
> > Also, to clarify: are you saying that for those 20ish clients who start
> > experiencing connection problems, they experience those connection
> > problems
> > continually?  Or do they recover after a few failures, only to have other
> > clients fail later?
> >
> > One last thing: the version of ActiveMQ you're using is ALWAYS relevant
> > information, and should be included in any post to this mailing list
> > asking
> > for help.  How are we supposed to help figure out what's going on (or if
> > it's a known bug that's been fixed in a later version) if you don't tell
> > us
> > what version you're using?  For example,
> > https://issues.apache.org/jira/browse/AMQ-5241 is fixed in 5.10.1 and
> > 5.11.0, but I have no idea whether you're running a version that has that
> > fix.
> >
> > Tim
> >
> > On Tue, Jul 7, 2015 at 6:32 PM, Cadmean <
>
> > hzcadmean@
>
> > > wrote:
> >
> >> 1. Since all the clients are in the INTRANET, I don't think the network
> >> could
> >> be a problem, but I will check it anyway.
> >>
> >> 2. Right now, I haven't started producing messages. In this case, all
> the
> >> clients are just consumers without receving any messages. So I think the
> >> message redeliveries can not be the cause of the problem.
> >>
> >> The next thing I will try to do is opening debug logging to see if there
> >> is
> >> any helpful information.
> >>
> >> Thanks a lot. :D
> >>
> >>
> >> artnaseef wrote
> >> > First thing I would look at here is diagnostics from the network level
> >> > itself.  WireShark or tcpdump can be used to get a better
> understanding
> >> of
> >> > why the connections are dropping.
> >> >
> >> > If the network between the client and brokers is unreliable, this will
> >> > happen a lot and it will significantly interfere with the messaging.
> >> >
> >> > Also check the broker log files for any indications of causes of the
> >> > dropped connections.
> >> >
> >> > With all of that said, with the failover transport, these failures
> >> should
> >> > be short-lived and all of the applications should continue to operate
> >> > normally.  The impact of greatest concern coming to mind is the
> >> increased
> >> > probability of message redeliveries, but that is a normal occurrence
> >> with
> >> > JMS (in other words, applications need to handle this possibility with
> >> or
> >> > without these dropped connections).
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
> >> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
> >>
>
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698842.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>