Failover - send timeout not working

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Failover - send timeout not working

Martin Lichtin
I'm running into a situation with the Failover Transport not respecting the timeout that's been set.
The symptom is endless messages of this kind:

2017-04-29 09:48:26,128 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
2017-04-29 09:48:26,228 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
2017-04-29 09:48:26,329 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
2017-04-29 09:48:26,429 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
2017-04-29 09:48:26,530 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
...
2017-04-29 09:48:33,270 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
2017-04-29 09:48:33,371 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}


It seems to never get out of this loop:

     while (transport == null && !disposed && connectionFailure == null
             && !Thread.currentThread().isInterrupted() && willReconnect()) {

         LOG.trace("Waiting for transport to reconnect..: {}", command);
         long end = System.currentTimeMillis();
         if (command.isMessage() && timeout > 0 && (end - start > timeout)) {
             timedout = true;
             LOG.info("Failover timed out after {} ms", (end - start));
             break;
         }
         try {
             reconnectMutex.wait(100);
         } catch (InterruptedException e) {
             Thread.currentThread().interrupt();
             LOG.debug("Interupted:", e);
         }
         transport = connectedTransport.get();
     }

The timeout is set to 5000ms and should have hit a long time ago, but I guess "command.isMessage()" returns false in this situation?
Any reason this condition is needed? It seems to me anything that can't be sent within the timeout should cause the throw.

- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Failover - send timeout not working

Tim Bain
It does seem like there needs to be another path out of that code block,
though I wonder if the expectation was that nothing that wasn't a message
would ever make it to that block. Either way, it seems to not be working
the way it should, so please submit a bug in JIRA for the problem.

Tim

On Apr 29, 2017 4:24 AM, "Martin Lichtin" <[hidden email]> wrote:

> I'm running into a situation with the Failover Transport not respecting
> the timeout that's been set.
> The symptom is endless messages of this kind:
>
> 2017-04-29 09:48:26,128 | TRACE | .engine.cfgengine.in]-11 |
> FailoverTransport                | sport.failover.FailoverTransport 615 |
> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
> transactionId = XID:[1096044365,globalId=63747
> 26c6366672d656e67696e653130333530323030303034,branchId=63747
> 26c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:26,228 | TRACE | .engine.cfgengine.in]-11 |
> FailoverTransport                | sport.failover.FailoverTransport 615 |
> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
> transactionId = XID:[1096044365,globalId=63747
> 26c6366672d656e67696e653130333530323030303034,branchId=63747
> 26c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:26,329 | TRACE | .engine.cfgengine.in]-11 |
> FailoverTransport                | sport.failover.FailoverTransport 615 |
> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
> transactionId = XID:[1096044365,globalId=63747
> 26c6366672d656e67696e653130333530323030303034,branchId=63747
> 26c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:26,429 | TRACE | .engine.cfgengine.in]-11 |
> FailoverTransport                | sport.failover.FailoverTransport 615 |
> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
> transactionId = XID:[1096044365,globalId=63747
> 26c6366672d656e67696e653130333530323030303034,branchId=63747
> 26c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:26,530 | TRACE | .engine.cfgengine.in]-11 |
> FailoverTransport                | sport.failover.FailoverTransport 615 |
> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
> transactionId = XID:[1096044365,globalId=63747
> 26c6366672d656e67696e653130333530323030303034,branchId=63747
> 26c6366672d656e67696e6531313036383134]}
> ...
> 2017-04-29 09:48:33,270 | TRACE | .engine.cfgengine.in]-11 |
> FailoverTransport                | sport.failover.FailoverTransport 615 |
> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
> transactionId = XID:[1096044365,globalId=63747
> 26c6366672d656e67696e653130333530323030303034,branchId=63747
> 26c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:33,371 | TRACE | .engine.cfgengine.in]-11 |
> FailoverTransport                | sport.failover.FailoverTransport 615 |
> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
> transactionId = XID:[1096044365,globalId=63747
> 26c6366672d656e67696e653130333530323030303034,branchId=63747
> 26c6366672d656e67696e6531313036383134]}
>
>
> It seems to never get out of this loop:
>
>     while (transport == null && !disposed && connectionFailure == null
>             && !Thread.currentThread().isInterrupted() &&
> willReconnect()) {
>
>         LOG.trace("Waiting for transport to reconnect..: {}", command);
>         long end = System.currentTimeMillis();
>         if (command.isMessage() && timeout > 0 && (end - start > timeout))
> {
>             timedout = true;
>             LOG.info("Failover timed out after {} ms", (end - start));
>             break;
>         }
>         try {
>             reconnectMutex.wait(100);
>         } catch (InterruptedException e) {
>             Thread.currentThread().interrupt();
>             LOG.debug("Interupted:", e);
>         }
>         transport = connectedTransport.get();
>     }
>
> The timeout is set to 5000ms and should have hit a long time ago, but I
> guess "command.isMessage()" returns false in this situation?
> Any reason this condition is needed? It seems to me anything that can't be
> sent within the timeout should cause the throw.
>
> - Martin
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Failover - send timeout not working

Martin Lichtin
JIRA: https://issues.apache.org/jira/browse/AMQ-6666

I'm running now with a patched version without the "command.isMessage()" condition and fail-over behaviour has substantially improved.
Can't really judge whether it is the correct fix, however.

- Martin


On 30.04.2017 18:50, Tim Bain wrote:

> It does seem like there needs to be another path out of that code block,
> though I wonder if the expectation was that nothing that wasn't a message
> would ever make it to that block. Either way, it seems to not be working
> the way it should, so please submit a bug in JIRA for the problem.
>
> Tim
>
> On Apr 29, 2017 4:24 AM, "Martin Lichtin" <[hidden email]> wrote:
>
>> I'm running into a situation with the Failover Transport not respecting
>> the timeout that's been set.
>> The symptom is endless messages of this kind:
>>
>> 2017-04-29 09:48:26,128 | TRACE | .engine.cfgengine.in]-11 |
>> FailoverTransport                | sport.failover.FailoverTransport 615 |
>> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
>> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
>> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
>> transactionId = XID:[1096044365,globalId=63747
>> 26c6366672d656e67696e653130333530323030303034,branchId=63747
>> 26c6366672d656e67696e6531313036383134]}
>> 2017-04-29 09:48:26,228 | TRACE | .engine.cfgengine.in]-11 |
>> FailoverTransport                | sport.failover.FailoverTransport 615 |
>> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
>> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
>> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
>> transactionId = XID:[1096044365,globalId=63747
>> 26c6366672d656e67696e653130333530323030303034,branchId=63747
>> 26c6366672d656e67696e6531313036383134]}
>> 2017-04-29 09:48:26,329 | TRACE | .engine.cfgengine.in]-11 |
>> FailoverTransport                | sport.failover.FailoverTransport 615 |
>> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
>> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
>> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
>> transactionId = XID:[1096044365,globalId=63747
>> 26c6366672d656e67696e653130333530323030303034,branchId=63747
>> 26c6366672d656e67696e6531313036383134]}
>> 2017-04-29 09:48:26,429 | TRACE | .engine.cfgengine.in]-11 |
>> FailoverTransport                | sport.failover.FailoverTransport 615 |
>> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
>> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
>> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
>> transactionId = XID:[1096044365,globalId=63747
>> 26c6366672d656e67696e653130333530323030303034,branchId=63747
>> 26c6366672d656e67696e6531313036383134]}
>> 2017-04-29 09:48:26,530 | TRACE | .engine.cfgengine.in]-11 |
>> FailoverTransport                | sport.failover.FailoverTransport 615 |
>> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
>> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
>> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
>> transactionId = XID:[1096044365,globalId=63747
>> 26c6366672d656e67696e653130333530323030303034,branchId=63747
>> 26c6366672d656e67696e6531313036383134]}
>> ...
>> 2017-04-29 09:48:33,270 | TRACE | .engine.cfgengine.in]-11 |
>> FailoverTransport                | sport.failover.FailoverTransport 615 |
>> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
>> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
>> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
>> transactionId = XID:[1096044365,globalId=63747
>> 26c6366672d656e67696e653130333530323030303034,branchId=63747
>> 26c6366672d656e67696e6531313036383134]}
>> 2017-04-29 09:48:33,371 | TRACE | .engine.cfgengine.in]-11 |
>> FailoverTransport                | sport.failover.FailoverTransport 615 |
>> 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport
>> to reconnect..: TransactionInfo {commandId = 127798, responseRequired =
>> true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3,
>> transactionId = XID:[1096044365,globalId=63747
>> 26c6366672d656e67696e653130333530323030303034,branchId=63747
>> 26c6366672d656e67696e6531313036383134]}
>>
>>
>> It seems to never get out of this loop:
>>
>>      while (transport == null && !disposed && connectionFailure == null
>>              && !Thread.currentThread().isInterrupted() &&
>> willReconnect()) {
>>
>>          LOG.trace("Waiting for transport to reconnect..: {}", command);
>>          long end = System.currentTimeMillis();
>>          if (command.isMessage() && timeout > 0 && (end - start > timeout))
>> {
>>              timedout = true;
>>              LOG.info("Failover timed out after {} ms", (end - start));
>>              break;
>>          }
>>          try {
>>              reconnectMutex.wait(100);
>>          } catch (InterruptedException e) {
>>              Thread.currentThread().interrupt();
>>              LOG.debug("Interupted:", e);
>>          }
>>          transport = connectedTransport.get();
>>      }
>>
>> The timeout is set to 5000ms and should have hit a long time ago, but I
>> guess "command.isMessage()" returns false in this situation?
>> Any reason this condition is needed? It seems to me anything that can't be
>> sent within the timeout should cause the throw.
>>
>> - Martin
>>
>>