Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

funkyjive
I have an Artemis 1.4.0 embedded in a Java application acting as the queuing server for 2->N wildly 10 nodes connected as clients.  WildFly 10 is only connecting as a client to this external Artemis and it is obviously using the JMS interface since that is what is supported in WildFly.

The Java application with Artemis embedded in it also has a connection to itself via the JMS client libraries and does this connection via the ActiveMQInitialContext created through jndi.properties.  (Very nice idea and implementation, by the way.  I was able to migrate this code from a different JMS provider very cleanly because of this).  

I then take the above installation and Augment it by adding a separate application with embedded artemis as a standby server.  I adjust the JNDI properties and the broker.xml on both nodes and with not much trouble, I have a failover/failback configuration and if I take down one of the embedded artemis servers all of the WildFly nodes fall over to the secondary -- and when I bring the primary backup, because failover is enabled they fail back.  Everything appears to work well thus far.

I then have an application that was originally written for HornetQ using the Core API that I switched to use Artemis's core API.  For purposes of this discussion, this application is only a client to the Artemis server as well.  I have updated this application to support the HA configuration of the artemis server by calling the HA variant of createServerLocator:

      serverLocator = ActiveMQClient.createServerLocatorWithHA(new TransportConfiguration(NettyConnectorFactory.class.getName(), getTransportParams(host1,pUseSSL))
              ,new TransportConfiguration(NettyConnectorFactory.class.getName(), getTransportParams(host2,pUseSSL)));

I then indicate to the serverLocator that it should automatically reconnect by setting the reconnect attempts to -1:

    serverLocator.setReconnectAttempts(-1);

I then augment all of the configuration to use SSL.  (wildly, Artemis broker.xml, jndi.properties and this application).  Everything works just fine on the initial connect.  If the primary node is up, it connects fine.  If the primary node is down, it connects just fine to the secondary node.  No problem.

A problem occurs though when there is any kind of failover.  Either the primary node is switched off (causing a connect to secondary) or the primary node is brought back online (causing a failback to primary in our configuration) all of the the other connections (jndi.properties based connections, artemis-> artemis HA cluster connections and WildFly connections) reconnect without issue.  But the Client app using the Artemis Core API has trouble reconnecting and it appears to be SSL related:

Jan 31, 2017 9:18:10 AM org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector createConnection
ERROR: AMQ214016: Failed to create netty connection
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
        at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431)
        at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
        at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813)
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1094)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:966)
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:900)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:572)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:513)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:427)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:399)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1506)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
        at sun.security.ssl.Handshaker$1.run(Handshaker.java:919)
        at sun.security.ssl.Handshaker$1.run(Handshaker.java:916)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1369)
        at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1120)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1005)
        ... 17 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
        at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
        at sun.security.validator.Validator.validate(Validator.java:260)
        at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
        at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:281)
        at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1493)
        ... 25 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:146)
        at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:131)
        at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
        ... 31 more

To me it almost seems as if there is some SSL connection information that is valid for the node we are currently connected to that gets transferred to the secondary node and it runs into an issue negotiating the connection.

Any advice or direction anyone can offer?  My plan is to attempt upgrade from 1.4.0 to 1.5.2 and see if anything changes, but this will take some time on my part.

Thanks in advance,

David.
Reply | Threaded
Open this post in threaded view
|

Re: Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

Justin Bertram-2
I just set up an example to test this and everything worked fine.  I essentially merged the "ssl-enabled" and "stop-server-failover" examples together.

Could you perhaps work up a simple reproducer (e.g. using an existing example)?


Justin

----- Original Message -----
From: "funkyjive" <[hidden email]>
To: [hidden email]
Sent: Tuesday, January 31, 2017 11:02:13 AM
Subject: Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

I have an Artemis 1.4.0 embedded in a Java application acting as the queuing
server for 2->N wildly 10 nodes connected as clients.  WildFly 10 is only
connecting as a client to this external Artemis and it is obviously using
the JMS interface since that is what is supported in WildFly.

The Java application with Artemis embedded in it also has a connection to
itself via the JMS client libraries and does this connection via the
ActiveMQInitialContext created through jndi.properties.  (Very nice idea and
implementation, by the way.  I was able to migrate this code from a
different JMS provider very cleanly because of this).  

I then take the above installation and Augment it by adding a separate
application with embedded artemis as a standby server.  I adjust the JNDI
properties and the broker.xml on both nodes and with not much trouble, I
have a failover/failback configuration and if I take down one of the
embedded artemis servers all of the WildFly nodes fall over to the secondary
-- and when I bring the primary backup, because failover is enabled they
fail back.  Everything appears to work well thus far.

I then have an application that was originally written for HornetQ using the
Core API that I switched to use Artemis's core API.  For purposes of this
discussion, this application is only a client to the Artemis server as well.
I have updated this application to support the HA configuration of the
artemis server by calling the HA variant of createServerLocator:

      serverLocator = ActiveMQClient.createServerLocatorWithHA(new
TransportConfiguration(NettyConnectorFactory.class.getName(),
getTransportParams(host1,pUseSSL))
              ,new
TransportConfiguration(NettyConnectorFactory.class.getName(),
getTransportParams(host2,pUseSSL)));

I then indicate to the serverLocator that it should automatically reconnect
by setting the reconnect attempts to -1:

    serverLocator.setReconnectAttempts(-1);

I then augment all of the configuration to use SSL.  (wildly, Artemis
broker.xml, jndi.properties and this application).  Everything works just
fine on the initial connect.  If the primary node is up, it connects fine.
If the primary node is down, it connects just fine to the secondary node.
No problem.

A problem occurs though when there is any kind of failover.  Either the
primary node is switched off (causing a connect to secondary) or the primary
node is brought back online (causing a failback to primary in our
configuration) all of the the other connections (jndi.properties based
connections, artemis-> artemis HA cluster connections and WildFly
connections) reconnect without issue.  But the Client app using the Artemis
Core API has trouble reconnecting and it appears to be SSL related:

Jan 31, 2017 9:18:10 AM
org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector
createConnection
ERROR: AMQ214016: Failed to create netty connection
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
        at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431)
        at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
        at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813)
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1094)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:966)
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:900)
        at
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
        at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
        at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
        at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
        at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345)
        at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
        at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
        at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
        at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
        at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:572)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:513)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:427)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:399)
        at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
        at
sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1506)
        at
sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
        at sun.security.ssl.Handshaker$1.run(Handshaker.java:919)
        at sun.security.ssl.Handshaker$1.run(Handshaker.java:916)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1369)
        at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1120)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1005)
        ... 17 more
Caused by: sun.security.validator.ValidatorException: PKIX path building
failed: sun.security.provider.certpath.SunCertPathBuilderException: unable
to find valid certification path to requested target
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
        at
sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
        at sun.security.validator.Validator.validate(Validator.java:260)
        at
sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
        at
sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:281)
        at
sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
        at
sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1493)
        ... 25 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target
        at
sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:146)
        at
sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:131)
        at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
        ... 31 more

To me it almost seems as if there is some SSL connection information that is
valid for the node we are currently connected to that gets transferred to
the secondary node and it runs into an issue negotiating the connection.

Any advice or direction anyone can offer?  My plan is to attempt upgrade
from 1.4.0 to 1.5.2 and see if anything changes, but this will take some
time on my part.

Thanks in advance,

David.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Client-Reconnect-failover-problem-via-Core-API-with-HA-Configuration-when-configured-with-SSL-tp4721435.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

funkyjive
Did you try this with the latest 1.5.2?  Or with 1.4.0?  If you "just worked" with 1.5.2, maybe I'll try that first.

There is a bit of work to do to extract everything enough to reproduce this and get it down to its smallest set.

A couple of important questions:  Did you actually use two different machines on the SSL?  And was there two different certificates in your trust store?

I think this may be an essential feature because if this was done completely on the same machine with the same certificate, just using a different port -- there is every reason to suspect the SSL Context settings would be fine being exactly the same.  

If you did it on one machine and it worked for the above reason, it would be consistent with my analysis that the ssl settings could be copied and while valid for the initial machine, would be invalid for the second connection.   (Where if it was the same machine and same cert, that would simply work).

Let me see what it will take to do a simple reproduction.
Reply | Threaded
Open this post in threaded view
|

Re: Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

Justin Bertram-2
I tried this on master so it would be beyond 1.5.2.  That said, not much has changed with the SSL implementation stuff in quite some time so I wouldn't expect that to make a difference.

As far as getting things down to the smallest set, I don't see any way around that.  Until you can eliminate all non-essential elements you can't really understand the problem.

I didn't use 2 different physical machines as the example just runs on a single box.  However, 2 different Artemis instances were used and each had their own unique certificate which was, in turn, imported into the client's truststore.  Therefore I think it is a valid test of your assumption.


Justin

----- Original Message -----
From: "funkyjive" <[hidden email]>
To: [hidden email]
Sent: Tuesday, January 31, 2017 6:00:22 PM
Subject: Re: Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

Did you try this with the latest 1.5.2?  Or with 1.4.0?  If you "just worked"
with 1.5.2, maybe I'll try that first.

There is a bit of work to do to extract everything enough to reproduce this
and get it down to its smallest set.

A couple of important questions:  Did you actually use two different
machines on the SSL?  And was there two different certificates in your trust
store?

I think this may be an essential feature because if this was done completely
on the same machine with the same certificate, just using a different port
-- there is every reason to suspect the SSL Context settings would be fine
being exactly the same.  

If you did it on one machine and it worked for the above reason, it would be
consistent with my analysis that the ssl settings could be copied and while
valid for the initial machine, would be invalid for the second connection.  
(Where if it was the same machine and same cert, that would simply work).

Let me see what it will take to do a simple reproduction.




--
View this message in context: http://activemq.2283324.n4.nabble.com/Client-Reconnect-failover-problem-via-Core-API-with-HA-Configuration-when-configured-with-SSL-tp4721435p4721450.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

Justin Bertram
For what it's worth, I just tested with 1.4.0 and the result was the same as before (i.e. everything worked).


Justin

----- Original Message -----
From: "Justin Bertram" <[hidden email]>
To: [hidden email]
Sent: Tuesday, January 31, 2017 6:47:07 PM
Subject: Re: Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

I tried this on master so it would be beyond 1.5.2.  That said, not much has changed with the SSL implementation stuff in quite some time so I wouldn't expect that to make a difference.

As far as getting things down to the smallest set, I don't see any way around that.  Until you can eliminate all non-essential elements you can't really understand the problem.

I didn't use 2 different physical machines as the example just runs on a single box.  However, 2 different Artemis instances were used and each had their own unique certificate which was, in turn, imported into the client's truststore.  Therefore I think it is a valid test of your assumption.


Justin

----- Original Message -----
From: "funkyjive" <[hidden email]>
To: [hidden email]
Sent: Tuesday, January 31, 2017 6:00:22 PM
Subject: Re: Client Reconnect/failover problem via Core API with HA Configuration when configured with SSL

Did you try this with the latest 1.5.2?  Or with 1.4.0?  If you "just worked"
with 1.5.2, maybe I'll try that first.

There is a bit of work to do to extract everything enough to reproduce this
and get it down to its smallest set.

A couple of important questions:  Did you actually use two different
machines on the SSL?  And was there two different certificates in your trust
store?

I think this may be an essential feature because if this was done completely
on the same machine with the same certificate, just using a different port
-- there is every reason to suspect the SSL Context settings would be fine
being exactly the same.  

If you did it on one machine and it worked for the above reason, it would be
consistent with my analysis that the ssl settings could be copied and while
valid for the initial machine, would be invalid for the second connection.  
(Where if it was the same machine and same cert, that would simply work).

Let me see what it will take to do a simple reproduction.




--
View this message in context: http://activemq.2283324.n4.nabble.com/Client-Reconnect-failover-problem-via-Core-API-with-HA-Configuration-when-configured-with-SSL-tp4721435p4721450.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.