Artemis rejects the clientid after client disconnection

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Artemis rejects the clientid after client disconnection

francesco81
Hi,
we're facing another issue with Artemis.
Often it happens that after a client disconnects not gracefully (could be for many reasons: energy loss, network loss, etc.), Artemis doesn't accept anymore a connection with the same client id. This results, of course, in a situation where the client cannot connect any more (unless I restart Artemis). I seems like Artemis doesn't "really" disconnect the client.
Inside the log (set on TRACE level) I see all the activities that I expect:
1) client connects
2) client sends and receives messages correctly
3) client makes PINGREQ and receives PINGRESP correctly
4) Artemis checks the TTL of the specific connection id related to that client
5) When client goes down, after <keepaliveSetAtClientSide> * 1,5 seconds (which in our case result in 90 seconds) Artemis realizes that the specific connection id is broken, and it disconnects it.
But at this point the client cannot connect any more.
Every time I try to connect by using that specific client id, it is rejected. And I have no error inside the log. It seems simply as the broker "think" that a connection with that client id is still up and running. If I connect with another client id and try to send a message to a topic where the broken client id was subscribed to, I see inside the log that Artemis try to send the message to that client id:
MQTT(<clientid>): OUT >> PUBLISH(1) EXACTLY_ONCE
...but really there's no connection for that client id. Not more, anyway.
Unfortunately I'm not yet able to reproduce the bug sistematically. But I'm hard working to do it, because it's pretty much blocker to us.

Thanks

Francesco
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

Martyn Taylor
Hi Francesco,

What version of Artemis are you using?

From your description it does look like the connection is not cleaned up
after the TTL.  I'll attempt to reproduce locally.

Thanks


On Thu, Jun 15, 2017 at 11:53 AM, francesco81 <[hidden email]
> wrote:

> Hi,
> we're facing another issue with Artemis.
> Often it happens that after a client disconnects not gracefully (could be
> for many reasons: energy loss, network loss, etc.), Artemis doesn't accept
> anymore a connection with the same client id. This results, of course, in a
> situation where the client cannot connect any more (unless I restart
> Artemis). I seems like Artemis doesn't "really" disconnect the client.
> Inside the log (set on TRACE level) I see all the activities that I expect:
> 1) client connects
> 2) client sends and receives messages correctly
> 3) client makes PINGREQ and receives PINGRESP correctly
> 4) Artemis checks the TTL of the specific connection id related to that
> client
> 5) When client goes down, after <keepaliveSetAtClientSide> * 1,5 seconds
> (which in our case result in 90 seconds) Artemis realizes that the specific
> connection id is broken, and it disconnects it.
> But at this point the client cannot connect any more.
> Every time I try to connect by using that specific client id, it is
> rejected. And I have no error inside the log. It seems simply as the broker
> "think" that a connection with that client id is still up and running. If I
> connect with another client id and try to send a message to a topic where
> the broken client id was subscribed to, I see inside the log that Artemis
> try to send the message to that client id:
> MQTT(<clientid>): OUT >> PUBLISH(1) EXACTLY_ONCE
> ...but really there's no connection for that client id. Not more, anyway.
> Unfortunately I'm not yet able to reproduce the bug sistematically. But I'm
> hard working to do it, because it's pretty much blocker to us.
>
> Thanks
>
> Francesco
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Artemis-rejects-the-clientid-after-client-
> disconnection-tp4727426.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

francesco81
Hi Martyn,
thanks so much for your answer.
I'm using the official version 2.1.0, downloaded from the website (https://www.apache.org/dyn/closer.cgi?filename=activemq/activemq-artemis/2.1.0/apache-artemis-2.1.0-bin.zip&action=download).
As soon as I'm able to reproduce it in a systematic way, I'll let you know.

Attached there's my broker.xml (maybe can help you).
broker.xml

Francesco
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

jbertram
This may be related to ARTEMIS-1218 [1].


Justin

[1] https://issues.apache.org/jira/browse/ARTEMIS-1218

----- Original Message -----
From: "francesco81" <[hidden email]>
To: [hidden email]
Sent: Thursday, June 15, 2017 6:50:20 AM
Subject: Re: Artemis rejects the clientid after client disconnection

Hi Martyn,
thanks so much for your answer.
I'm using the official version 2.1.0, downloaded from the website
(https://www.apache.org/dyn/closer.cgi?filename=activemq/activemq-artemis/2.1.0/apache-artemis-2.1.0-bin.zip&action=download).
As soon as I'm able to reproduce it in a systematic way, I'll let you know.

Attached there's my broker.xml (maybe can help you).
broker.xml <http://activemq.2283324.n4.nabble.com/file/n4727430/broker.xml>  

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4727430.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

francesco81
In reply to this post by francesco81
Just to give you all possible information:
when the client goes down (as I wrote above: not gracefully) Artemis does NOT send any lwt message. Actuallly the client, on connection, indicates an lwt topic and message. And in case of gracefully disconnection Artemis sends it... In this case, instead, Artemis does NOT send it. I don't know if it's cause, consequence or simply part of the problem ...anyway it's related. Hope can help.

Francesco
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

francesco81
An update.
We have re-build "artemis-commons-2.1.0.jar", "artemis-mqtt-protocol-2.1.0.jar" and "artemis-server-2.1.0.jar" in order to improve the logs. It seems that there is a problem with the thread which has in charge the activities to free the clientid, send lwt message, etc..
For a few time all works good. Broken connection are correctly cleaned and I see all the logs related to the clientid cleaning and lwt sending.
Then, at a certain point (we've not yet undestood why...) connections are no more cleaned correctly. And there are no more logs about the "cleaner" thread. And there's no error. It seems like the connection.fail method simply is not called.

logger.debug("connection id " + conn.getID() + " calling fail executor thread ");                     flushExecutor.execute(new Runnable() {                                                public void run() {                           logger.debug("starting thread for removing connection "+ pair.getA());                           conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(), pair.getB()));                           logger.debug("thread for connection "+ pair.getA() + " ended correctly");                        }

...I can't see the log above. Have you any idea about the cause?

Thanks.

Francesco
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

MichaelAndrePearce
This would suggest an uncaught exception could be killing the executor thread.

If you have a local build, Can you wrap the conn.fail method with a try catch (throwable t)

See if that helps at all.



Sent from my iPhone

> On 28 Jun 2017, at 16:19, francesco81 <[hidden email]> wrote:
>
> An update.
> We have re-build "artemis-commons-2.1.0.jar",
> "artemis-mqtt-protocol-2.1.0.jar" and "artemis-server-2.1.0.jar" in order to
> improve the logs. It seems that there is a problem with the thread which has
> in charge the activities to free the clientid, send lwt message, etc..
> For a few time all works good. Broken connection are correctly cleaned and I
> see all the logs related to the clientid cleaning and lwt sending.
> Then, at a certain point (we've not yet undestood why...) connections are no
> more cleaned correctly. And there are no more logs about the "cleaner"
> thread. And there's no error. It seems like the connection.fail method
> simply is not called.
>
> logger.debug("connection id " + conn.getID() + " calling fail executor
> thread ");                     flushExecutor.execute(new Runnable() {                                              
> public void run() {                           logger.debug("starting thread
> for removing connection "+ pair.getA());                          
> conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(),
> pair.getB()));                           logger.debug("thread for connection
> "+ pair.getA() + " ended correctly");                        }
>
> ...I can't see the log above. Have you any idea about the cause?
>
> Thanks.
>
> Francesco
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728012.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

MichaelAndrePearce
I mean to add the try catch within the runnable run method.

Sent from my iPhone

> On 28 Jun 2017, at 16:48, Michael André Pearce <[hidden email]> wrote:
>
> This would suggest an uncaught exception could be killing the executor thread.
>
> If you have a local build, Can you wrap the conn.fail method with a try catch (throwable t)
>
> See if that helps at all.
>
>
>
> Sent from my iPhone
>
>> On 28 Jun 2017, at 16:19, francesco81 <[hidden email]> wrote:
>>
>> An update.
>> We have re-build "artemis-commons-2.1.0.jar",
>> "artemis-mqtt-protocol-2.1.0.jar" and "artemis-server-2.1.0.jar" in order to
>> improve the logs. It seems that there is a problem with the thread which has
>> in charge the activities to free the clientid, send lwt message, etc..
>> For a few time all works good. Broken connection are correctly cleaned and I
>> see all the logs related to the clientid cleaning and lwt sending.
>> Then, at a certain point (we've not yet undestood why...) connections are no
>> more cleaned correctly. And there are no more logs about the "cleaner"
>> thread. And there's no error. It seems like the connection.fail method
>> simply is not called.
>>
>> logger.debug("connection id " + conn.getID() + " calling fail executor
>> thread ");                     flushExecutor.execute(new Runnable() {                                              
>> public void run() {                           logger.debug("starting thread
>> for removing connection "+ pair.getA());                          
>> conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(),
>> pair.getB()));                           logger.debug("thread for connection
>> "+ pair.getA() + " ended correctly");                        }
>>
>> ...I can't see the log above. Have you any idea about the cause?
>>
>> Thanks.
>>
>> Francesco
>>
>>
>>
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728012.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

MichaelAndrePearce
In reply to this post by MichaelAndrePearce
Also could you do a stack trace once it has failed the other option is the executors thread could blocked/deadlocked.

Sent from my iPhone

> On 28 Jun 2017, at 16:48, Michael André Pearce <[hidden email]> wrote:
>
> This would suggest an uncaught exception could be killing the executor thread.
>
> If you have a local build, Can you wrap the conn.fail method with a try catch (throwable t)
>
> See if that helps at all.
>
>
>
> Sent from my iPhone
>
>> On 28 Jun 2017, at 16:19, francesco81 <[hidden email]> wrote:
>>
>> An update.
>> We have re-build "artemis-commons-2.1.0.jar",
>> "artemis-mqtt-protocol-2.1.0.jar" and "artemis-server-2.1.0.jar" in order to
>> improve the logs. It seems that there is a problem with the thread which has
>> in charge the activities to free the clientid, send lwt message, etc..
>> For a few time all works good. Broken connection are correctly cleaned and I
>> see all the logs related to the clientid cleaning and lwt sending.
>> Then, at a certain point (we've not yet undestood why...) connections are no
>> more cleaned correctly. And there are no more logs about the "cleaner"
>> thread. And there's no error. It seems like the connection.fail method
>> simply is not called.
>>
>> logger.debug("connection id " + conn.getID() + " calling fail executor
>> thread ");                     flushExecutor.execute(new Runnable() {                                              
>> public void run() {                           logger.debug("starting thread
>> for removing connection "+ pair.getA());                          
>> conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(),
>> pair.getB()));                           logger.debug("thread for connection
>> "+ pair.getA() + " ended correctly");                        }
>>
>> ...I can't see the log above. Have you any idea about the cause?
>>
>> Thanks.
>>
>> Francesco
>>
>>
>>
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728012.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

francesco81
Hi Michael,
I can confirm that the thread is scheduled, because I see the log "connection id 1643635981 calling fail executor thread".
Then I see "RemotingServiceImpl::removing connection ID 1643635981".
And then ....nothing more: no trace. Like if the thread never takes in charge the effective execution.

Francesco
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

francesco81
I'm wondering if it's related to this tkt: https://issues.apache.org/jira/browse/ARTEMIS-1078
I know actually it should had been fixed with version 2.1.0 (that it's the version I'm using), but anyway it seems very similar.

Francesco
Reply | Threaded
Open this post in threaded view
|

Re: Artemis rejects the clientid after client disconnection

MichaelAndrePearce
In reply to this post by francesco81
Supplying/Posting  thread dumps and log files would be advantageous, especially without any systematic way to reproduce via integration test. More info the better :)

Locally I'm struggling to reproduce the issue using your broker xml and killing clients manually, but not able to reproduce so flying blind really. Any of the above would help possible tracking down of the issue.

> On 29 Jun 2017, at 11:29, francesco81 <[hidden email]> wrote:
>
> Hi Michael,
> I can confirm that the thread is scheduled, because I see the log
> "connection id 1643635981 calling fail executor thread".
> Then I see "RemotingServiceImpl::removing connection ID 1643635981".
> And then ....nothing more: no trace. Like if the thread never takes in
> charge the effective execution.
>
> Francesco
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728051.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.