[Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

Dan Langford
some of my users are attempting a pattern to deduplicate messages based on
a time window instead of a fixed amount of space (a duplicate id cache)

so far the concept has been working very well. So they send their AMQP
messages (qpid-jms-client) into a Last Value Queue with an appropriate
identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the message
that is essentially the lag they will allow as they want to wait for
possible duplicates. If any duplicates come in the Last Value Queue
behavior is replacing the older message with the newer message until the
expiration. expired messages are delivered to the preconfigured expiry
queue where their application is listening. This is not perfect but its not
intended to be. Its just intended to reduce additional unnecessary
processing and they understand this is not a guarantee. It really helps
with a system that produces messages in a way that has flurries of
"notifications" about the same assetID over and over again.

BUT where we are seeing is a problem is when we are consuming from the
queue used to hold expired messages and we toss some exception and the
message needs to be redelivered. the first time or two the message is
redelivered it is delivered OK. But when the JMSXDeliveryCount is about 3
or 4 (we use redelivery delay and multipliers to spread these out) our
qpid-jms-client stops being able to read the messages.

we were only able to reproduce this when an AMQP message expired onto the
queue. (expired from a LVQ in case that is relevant). if we place the
message directly on a queue and test different exception and redelivery
scenarios we cannot reproduce this behavior.

i enable the qpid-jms-client frame logging (via env variable
PN_TRACE_FRM=true) and i saw that in the situation when the client code
cannot access the payload, even though the broker WAS still sending the
payload. so i thought it was some odd issue with the client. The Apache
Qpid team responded that the issue seems to be that the broker starts to
send some ill formed payloads in this scenario. i dont want to repeat the
stack traces and their response, you can read those here

https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E

would it be helpful if i tested that this happens if there is not a LVQ
involved? i could have a message in a non-LVQ expire to another queue and
see if redeliveries over their get messed up after a few attempts. For the
record this is AMQP for producing and consuming. i do notice the messages
waiting in the expiry queue have much more headers messages sent directly
to a queue from client code. they seem to be headers full of information
about the message as it left the previous queue. I tried to send a message
directly to the expiry queue with all these headers to determine if it was
the existence of one of these specifically that trigger the malformed frame
but was not able to fully set all those headers. the JMSDeliverCount (type
Long) was the one that the client would not let me set and as a result i
could not test. for clarity thought i dont know that the issue exists due
to a header that is just what i saw as a difference between messages be
delivered to the queue by client code versus messages expiring from one
queue to another.

please look over the linked thread on the qpid list and let me know if you
know why a message transfer fram would become malformed after a few failed
deliveries only if the message expired onto the current queue.

thanks so much
Reply | Threaded
Open this post in threaded view
|

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

tabish121@gmail.com
On 08/13/2018 07:12 PM, Dan Langford wrote:

> some of my users are attempting a pattern to deduplicate messages based on
> a time window instead of a fixed amount of space (a duplicate id cache)
>
> so far the concept has been working very well. So they send their AMQP
> messages (qpid-jms-client) into a Last Value Queue with an appropriate
> identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the message
> that is essentially the lag they will allow as they want to wait for
> possible duplicates. If any duplicates come in the Last Value Queue
> behavior is replacing the older message with the newer message until the
> expiration. expired messages are delivered to the preconfigured expiry
> queue where their application is listening. This is not perfect but its not
> intended to be. Its just intended to reduce additional unnecessary
> processing and they understand this is not a guarantee. It really helps
> with a system that produces messages in a way that has flurries of
> "notifications" about the same assetID over and over again.
>
> BUT where we are seeing is a problem is when we are consuming from the
> queue used to hold expired messages and we toss some exception and the
> message needs to be redelivered. the first time or two the message is
> redelivered it is delivered OK. But when the JMSXDeliveryCount is about 3
> or 4 (we use redelivery delay and multipliers to spread these out) our
> qpid-jms-client stops being able to read the messages.
>
> we were only able to reproduce this when an AMQP message expired onto the
> queue. (expired from a LVQ in case that is relevant). if we place the
> message directly on a queue and test different exception and redelivery
> scenarios we cannot reproduce this behavior.
>
> i enable the qpid-jms-client frame logging (via env variable
> PN_TRACE_FRM=true) and i saw that in the situation when the client code
> cannot access the payload, even though the broker WAS still sending the
> payload. so i thought it was some odd issue with the client. The Apache
> Qpid team responded that the issue seems to be that the broker starts to
> send some ill formed payloads in this scenario. i dont want to repeat the
> stack traces and their response, you can read those here
>
> https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
>
> would it be helpful if i tested that this happens if there is not a LVQ
> involved? i could have a message in a non-LVQ expire to another queue and
> see if redeliveries over their get messed up after a few attempts. For the
> record this is AMQP for producing and consuming. i do notice the messages
> waiting in the expiry queue have much more headers messages sent directly
> to a queue from client code. they seem to be headers full of information
> about the message as it left the previous queue. I tried to send a message
> directly to the expiry queue with all these headers to determine if it was
> the existence of one of these specifically that trigger the malformed frame
> but was not able to fully set all those headers. the JMSDeliverCount (type
> Long) was the one that the client would not let me set and as a result i
> could not test. for clarity thought i dont know that the issue exists due
> to a header that is just what i saw as a difference between messages be
> delivered to the queue by client code versus messages expiring from one
> queue to another.
>
> please look over the linked thread on the qpid list and let me know if you
> know why a message transfer fram would become malformed after a few failed
> deliveries only if the message expired onto the current queue.
>
> thanks so much
>

A great place to start is to create a unit test that reproduces the
issue.  You can look at the Artemis unit tests for AMQP to get some
inspiration on how to set one up.  Then try and create the smallest
possible test that can reproduce the issue to make it easier to narrow
in on where the issue might be.

The AMQP tests in Artemis are located here:

https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp



--
Tim Bish

Reply | Threaded
Open this post in threaded view
|

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

Dan Langford
ok i wrote 3 test files. I don't know the best way to get them to you
easily. hopefully a Gist is ok.
https://gist.github.com/danlangford/071e738225ec0c68dd470816b977499b

you can copy those 3 files straight to
./tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp

The test JMSTransactedRedeliveryTest::testAMQPProducerAMQPConsumer proves
that a transacted client can .rollback() a handful of times and still be
able to consume the redelivered message later on.

The test JMSTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer shows
that if a message had been expired and now a transacted client is
attempting to consume it the client only has 2 chances before the broker
starts sending the message in a way that will not parse correctly

The test JMSNonTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer shows
that if a message had been expired a non-transacted client has no troubles
reliably accessing the redelivered message from broker


as you can tell i am mostly concerned about AMQP->AMQP for my use case.
some of those other combos are failing some of these tests in other ways.
naturally you can address those as you see fit but for my client the
AMQP->AMQP is a roadblocker.


let me know if you can determine why the broker is sending an extra null
character in the payload on the third time the messages attempts delivery.
maybe we are doing something incorrectly.


This has been more of an issue than i thought due to the fact that Spring
default to enabling transactions. in all of my initial tests i couldn't
reproduce it because i prefer the straight simplified jms api from 2.0 and
that defaults to sessions not being transacted. that being said nearly all
of my clients prefer using Spring Boot autoconfigurer and other spring
pieces which happen to default to transacted sessions.  i can now have some
of them workaround but others of them are requiring the transaction.


also as a reminder and for context here is a link to the initial
conversation i had with the Qpid Jms Client devs who pointed out to me the
erroneous null character in the message transfer from the broker:
https://lists.apache.org/
thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E


thank you so much for your time

On Tue, Aug 14, 2018 at 1:19 PM Timothy Bish <[hidden email]> wrote:

> On 08/13/2018 07:12 PM, Dan Langford wrote:
> > some of my users are attempting a pattern to deduplicate messages based
> on
> > a time window instead of a fixed amount of space (a duplicate id cache)
> >
> > so far the concept has been working very well. So they send their AMQP
> > messages (qpid-jms-client) into a Last Value Queue with an appropriate
> > identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the
> message
> > that is essentially the lag they will allow as they want to wait for
> > possible duplicates. If any duplicates come in the Last Value Queue
> > behavior is replacing the older message with the newer message until the
> > expiration. expired messages are delivered to the preconfigured expiry
> > queue where their application is listening. This is not perfect but its
> not
> > intended to be. Its just intended to reduce additional unnecessary
> > processing and they understand this is not a guarantee. It really helps
> > with a system that produces messages in a way that has flurries of
> > "notifications" about the same assetID over and over again.
> >
> > BUT where we are seeing is a problem is when we are consuming from the
> > queue used to hold expired messages and we toss some exception and the
> > message needs to be redelivered. the first time or two the message is
> > redelivered it is delivered OK. But when the JMSXDeliveryCount is about 3
> > or 4 (we use redelivery delay and multipliers to spread these out) our
> > qpid-jms-client stops being able to read the messages.
> >
> > we were only able to reproduce this when an AMQP message expired onto the
> > queue. (expired from a LVQ in case that is relevant). if we place the
> > message directly on a queue and test different exception and redelivery
> > scenarios we cannot reproduce this behavior.
> >
> > i enable the qpid-jms-client frame logging (via env variable
> > PN_TRACE_FRM=true) and i saw that in the situation when the client code
> > cannot access the payload, even though the broker WAS still sending the
> > payload. so i thought it was some odd issue with the client. The Apache
> > Qpid team responded that the issue seems to be that the broker starts to
> > send some ill formed payloads in this scenario. i dont want to repeat the
> > stack traces and their response, you can read those here
> >
> >
> https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
> >
> > would it be helpful if i tested that this happens if there is not a LVQ
> > involved? i could have a message in a non-LVQ expire to another queue and
> > see if redeliveries over their get messed up after a few attempts. For
> the
> > record this is AMQP for producing and consuming. i do notice the messages
> > waiting in the expiry queue have much more headers messages sent directly
> > to a queue from client code. they seem to be headers full of information
> > about the message as it left the previous queue. I tried to send a
> message
> > directly to the expiry queue with all these headers to determine if it
> was
> > the existence of one of these specifically that trigger the malformed
> frame
> > but was not able to fully set all those headers. the JMSDeliverCount
> (type
> > Long) was the one that the client would not let me set and as a result i
> > could not test. for clarity thought i dont know that the issue exists due
> > to a header that is just what i saw as a difference between messages be
> > delivered to the queue by client code versus messages expiring from one
> > queue to another.
> >
> > please look over the linked thread on the qpid list and let me know if
> you
> > know why a message transfer fram would become malformed after a few
> failed
> > deliveries only if the message expired onto the current queue.
> >
> > thanks so much
> >
>
> A great place to start is to create a unit test that reproduces the
> issue.  You can look at the Artemis unit tests for AMQP to get some
> inspiration on how to set one up.  Then try and create the smallest
> possible test that can reproduce the issue to make it easier to narrow
> in on where the issue might be.
>
> The AMQP tests in Artemis are located here:
>
>
> https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
>
>
>
> --
> Tim Bish
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

Robbie Gemmell
Tim and I had an initial look at this, and can see generally where the
broker is internally corrupting things on send, though not yet the
full picture how it gets there or what to do about it. The expiration
is likely to be key, one difference with the non-transacted case is
actually going to be because its using recover() which the client
performs locally.

Robbie

On Wed, 29 Aug 2018 at 06:23, Dan Langford <[hidden email]> wrote:

>
> ok i wrote 3 test files. I don't know the best way to get them to you
> easily. hopefully a Gist is ok.
> https://gist.github.com/danlangford/071e738225ec0c68dd470816b977499b
>
> you can copy those 3 files straight to
> ./tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
>
> The test JMSTransactedRedeliveryTest::testAMQPProducerAMQPConsumer proves
> that a transacted client can .rollback() a handful of times and still be
> able to consume the redelivered message later on.
>
> The test JMSTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer shows
> that if a message had been expired and now a transacted client is
> attempting to consume it the client only has 2 chances before the broker
> starts sending the message in a way that will not parse correctly
>
> The test JMSNonTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer shows
> that if a message had been expired a non-transacted client has no troubles
> reliably accessing the redelivered message from broker
>
>
> as you can tell i am mostly concerned about AMQP->AMQP for my use case.
> some of those other combos are failing some of these tests in other ways.
> naturally you can address those as you see fit but for my client the
> AMQP->AMQP is a roadblocker.
>
>
> let me know if you can determine why the broker is sending an extra null
> character in the payload on the third time the messages attempts delivery.
> maybe we are doing something incorrectly.
>
>
> This has been more of an issue than i thought due to the fact that Spring
> default to enabling transactions. in all of my initial tests i couldn't
> reproduce it because i prefer the straight simplified jms api from 2.0 and
> that defaults to sessions not being transacted. that being said nearly all
> of my clients prefer using Spring Boot autoconfigurer and other spring
> pieces which happen to default to transacted sessions.  i can now have some
> of them workaround but others of them are requiring the transaction.
>
>
> also as a reminder and for context here is a link to the initial
> conversation i had with the Qpid Jms Client devs who pointed out to me the
> erroneous null character in the message transfer from the broker:
> https://lists.apache.org/
> thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
>
>
> thank you so much for your time
>
> On Tue, Aug 14, 2018 at 1:19 PM Timothy Bish <[hidden email]> wrote:
>
> > On 08/13/2018 07:12 PM, Dan Langford wrote:
> > > some of my users are attempting a pattern to deduplicate messages based
> > on
> > > a time window instead of a fixed amount of space (a duplicate id cache)
> > >
> > > so far the concept has been working very well. So they send their AMQP
> > > messages (qpid-jms-client) into a Last Value Queue with an appropriate
> > > identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the
> > message
> > > that is essentially the lag they will allow as they want to wait for
> > > possible duplicates. If any duplicates come in the Last Value Queue
> > > behavior is replacing the older message with the newer message until the
> > > expiration. expired messages are delivered to the preconfigured expiry
> > > queue where their application is listening. This is not perfect but its
> > not
> > > intended to be. Its just intended to reduce additional unnecessary
> > > processing and they understand this is not a guarantee. It really helps
> > > with a system that produces messages in a way that has flurries of
> > > "notifications" about the same assetID over and over again.
> > >
> > > BUT where we are seeing is a problem is when we are consuming from the
> > > queue used to hold expired messages and we toss some exception and the
> > > message needs to be redelivered. the first time or two the message is
> > > redelivered it is delivered OK. But when the JMSXDeliveryCount is about 3
> > > or 4 (we use redelivery delay and multipliers to spread these out) our
> > > qpid-jms-client stops being able to read the messages.
> > >
> > > we were only able to reproduce this when an AMQP message expired onto the
> > > queue. (expired from a LVQ in case that is relevant). if we place the
> > > message directly on a queue and test different exception and redelivery
> > > scenarios we cannot reproduce this behavior.
> > >
> > > i enable the qpid-jms-client frame logging (via env variable
> > > PN_TRACE_FRM=true) and i saw that in the situation when the client code
> > > cannot access the payload, even though the broker WAS still sending the
> > > payload. so i thought it was some odd issue with the client. The Apache
> > > Qpid team responded that the issue seems to be that the broker starts to
> > > send some ill formed payloads in this scenario. i dont want to repeat the
> > > stack traces and their response, you can read those here
> > >
> > >
> > https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
> > >
> > > would it be helpful if i tested that this happens if there is not a LVQ
> > > involved? i could have a message in a non-LVQ expire to another queue and
> > > see if redeliveries over their get messed up after a few attempts. For
> > the
> > > record this is AMQP for producing and consuming. i do notice the messages
> > > waiting in the expiry queue have much more headers messages sent directly
> > > to a queue from client code. they seem to be headers full of information
> > > about the message as it left the previous queue. I tried to send a
> > message
> > > directly to the expiry queue with all these headers to determine if it
> > was
> > > the existence of one of these specifically that trigger the malformed
> > frame
> > > but was not able to fully set all those headers. the JMSDeliverCount
> > (type
> > > Long) was the one that the client would not let me set and as a result i
> > > could not test. for clarity thought i dont know that the issue exists due
> > > to a header that is just what i saw as a difference between messages be
> > > delivered to the queue by client code versus messages expiring from one
> > > queue to another.
> > >
> > > please look over the linked thread on the qpid list and let me know if
> > you
> > > know why a message transfer fram would become malformed after a few
> > failed
> > > deliveries only if the message expired onto the current queue.
> > >
> > > thanks so much
> > >
> >
> > A great place to start is to create a unit test that reproduces the
> > issue.  You can look at the Artemis unit tests for AMQP to get some
> > inspiration on how to set one up.  Then try and create the smallest
> > possible test that can reproduce the issue to make it easier to narrow
> > in on where the issue might be.
> >
> > The AMQP tests in Artemis are located here:
> >
> >
> > https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> >
> >
> >
> > --
> > Tim Bish
> >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

Dan Langford
thanks for looking into this. what is the proper way to force for testing a
redelivery that goes back to the broker without transactions? its probably
like killing the session or connection. that would be if we wanted to test
if non-transacted redeliveries were getting corrupted

we have a lot of people using spring-jms and as i look in the spring code...
https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractMessageListenerContainer.java
looking at doExecuteListener and rollbackOnExceptionIfNecessary
it looks like if its transacted they try to rollback() and if its
client_ack they try recover(). they probably dont handle auto_ack because
that was acked immediately?

speaking of this what is an appropriate way to NACK a message when using
Client Acknowledgemnt and a JMS messageHandler? is it simply to reach the
end of the method execution without having called message.acknowledge() or
would it be appropriate to throw a RuntimeException (since i cannot throw a
checked Exception out of an implementation of javax.jms.MessageListener) ?

On Thu, Aug 30, 2018 at 8:38 AM Robbie Gemmell <[hidden email]>
wrote:

> Tim and I had an initial look at this, and can see generally where the
> broker is internally corrupting things on send, though not yet the
> full picture how it gets there or what to do about it. The expiration
> is likely to be key, one difference with the non-transacted case is
> actually going to be because its using recover() which the client
> performs locally.
>
> Robbie
>
> On Wed, 29 Aug 2018 at 06:23, Dan Langford <[hidden email]> wrote:
> >
> > ok i wrote 3 test files. I don't know the best way to get them to you
> > easily. hopefully a Gist is ok.
> > https://gist.github.com/danlangford/071e738225ec0c68dd470816b977499b
> >
> > you can copy those 3 files straight to
> >
> ./tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> >
> > The test JMSTransactedRedeliveryTest::testAMQPProducerAMQPConsumer proves
> > that a transacted client can .rollback() a handful of times and still be
> > able to consume the redelivered message later on.
> >
> > The test JMSTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
> shows
> > that if a message had been expired and now a transacted client is
> > attempting to consume it the client only has 2 chances before the broker
> > starts sending the message in a way that will not parse correctly
> >
> > The test JMSNonTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
> shows
> > that if a message had been expired a non-transacted client has no
> troubles
> > reliably accessing the redelivered message from broker
> >
> >
> > as you can tell i am mostly concerned about AMQP->AMQP for my use case.
> > some of those other combos are failing some of these tests in other ways.
> > naturally you can address those as you see fit but for my client the
> > AMQP->AMQP is a roadblocker.
> >
> >
> > let me know if you can determine why the broker is sending an extra null
> > character in the payload on the third time the messages attempts
> delivery.
> > maybe we are doing something incorrectly.
> >
> >
> > This has been more of an issue than i thought due to the fact that Spring
> > default to enabling transactions. in all of my initial tests i couldn't
> > reproduce it because i prefer the straight simplified jms api from 2.0
> and
> > that defaults to sessions not being transacted. that being said nearly
> all
> > of my clients prefer using Spring Boot autoconfigurer and other spring
> > pieces which happen to default to transacted sessions.  i can now have
> some
> > of them workaround but others of them are requiring the transaction.
> >
> >
> > also as a reminder and for context here is a link to the initial
> > conversation i had with the Qpid Jms Client devs who pointed out to me
> the
> > erroneous null character in the message transfer from the broker:
> > https://lists.apache.org/
> > thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%
> 3Cusers.qpid.apache.org%3E
> >
> >
> > thank you so much for your time
> >
> > On Tue, Aug 14, 2018 at 1:19 PM Timothy Bish <[hidden email]>
> wrote:
> >
> > > On 08/13/2018 07:12 PM, Dan Langford wrote:
> > > > some of my users are attempting a pattern to deduplicate messages
> based
> > > on
> > > > a time window instead of a fixed amount of space (a duplicate id
> cache)
> > > >
> > > > so far the concept has been working very well. So they send their
> AMQP
> > > > messages (qpid-jms-client) into a Last Value Queue with an
> appropriate
> > > > identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the
> > > message
> > > > that is essentially the lag they will allow as they want to wait for
> > > > possible duplicates. If any duplicates come in the Last Value Queue
> > > > behavior is replacing the older message with the newer message until
> the
> > > > expiration. expired messages are delivered to the preconfigured
> expiry
> > > > queue where their application is listening. This is not perfect but
> its
> > > not
> > > > intended to be. Its just intended to reduce additional unnecessary
> > > > processing and they understand this is not a guarantee. It really
> helps
> > > > with a system that produces messages in a way that has flurries of
> > > > "notifications" about the same assetID over and over again.
> > > >
> > > > BUT where we are seeing is a problem is when we are consuming from
> the
> > > > queue used to hold expired messages and we toss some exception and
> the
> > > > message needs to be redelivered. the first time or two the message is
> > > > redelivered it is delivered OK. But when the JMSXDeliveryCount is
> about 3
> > > > or 4 (we use redelivery delay and multipliers to spread these out)
> our
> > > > qpid-jms-client stops being able to read the messages.
> > > >
> > > > we were only able to reproduce this when an AMQP message expired
> onto the
> > > > queue. (expired from a LVQ in case that is relevant). if we place the
> > > > message directly on a queue and test different exception and
> redelivery
> > > > scenarios we cannot reproduce this behavior.
> > > >
> > > > i enable the qpid-jms-client frame logging (via env variable
> > > > PN_TRACE_FRM=true) and i saw that in the situation when the client
> code
> > > > cannot access the payload, even though the broker WAS still sending
> the
> > > > payload. so i thought it was some odd issue with the client. The
> Apache
> > > > Qpid team responded that the issue seems to be that the broker
> starts to
> > > > send some ill formed payloads in this scenario. i dont want to
> repeat the
> > > > stack traces and their response, you can read those here
> > > >
> > > >
> > >
> https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
> > > >
> > > > would it be helpful if i tested that this happens if there is not a
> LVQ
> > > > involved? i could have a message in a non-LVQ expire to another
> queue and
> > > > see if redeliveries over their get messed up after a few attempts.
> For
> > > the
> > > > record this is AMQP for producing and consuming. i do notice the
> messages
> > > > waiting in the expiry queue have much more headers messages sent
> directly
> > > > to a queue from client code. they seem to be headers full of
> information
> > > > about the message as it left the previous queue. I tried to send a
> > > message
> > > > directly to the expiry queue with all these headers to determine if
> it
> > > was
> > > > the existence of one of these specifically that trigger the malformed
> > > frame
> > > > but was not able to fully set all those headers. the JMSDeliverCount
> > > (type
> > > > Long) was the one that the client would not let me set and as a
> result i
> > > > could not test. for clarity thought i dont know that the issue
> exists due
> > > > to a header that is just what i saw as a difference between messages
> be
> > > > delivered to the queue by client code versus messages expiring from
> one
> > > > queue to another.
> > > >
> > > > please look over the linked thread on the qpid list and let me know
> if
> > > you
> > > > know why a message transfer fram would become malformed after a few
> > > failed
> > > > deliveries only if the message expired onto the current queue.
> > > >
> > > > thanks so much
> > > >
> > >
> > > A great place to start is to create a unit test that reproduces the
> > > issue.  You can look at the Artemis unit tests for AMQP to get some
> > > inspiration on how to set one up.  Then try and create the smallest
> > > possible test that can reproduce the issue to make it easier to narrow
> > > in on where the issue might be.
> > >
> > > The AMQP tests in Artemis are located here:
> > >
> > >
> > >
> https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> > >
> > >
> > >
> > > --
> > > Tim Bish
> > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

tabish121@gmail.com
On 08/30/2018 07:10 PM, Dan Langford wrote:
> thanks for looking into this. what is the proper way to force for testing a
> redelivery that goes back to the broker without transactions? its probably
> like killing the session or connection. that would be if we wanted to test
> if non-transacted redeliveries were getting corrupted

The issue has been identified and fixed, should appear in the next release.
https://issues.apache.org/jira/browse/ARTEMIS-2082

>
> we have a lot of people using spring-jms and as i look in the spring code...
> https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractMessageListenerContainer.java
> looking at doExecuteListener and rollbackOnExceptionIfNecessary
> it looks like if its transacted they try to rollback() and if its
> client_ack they try recover(). they probably dont handle auto_ack because
> that was acked immediately?
>
> speaking of this what is an appropriate way to NACK a message when using
> Client Acknowledgemnt and a JMS messageHandler? is it simply to reach the
> end of the method execution without having called message.acknowledge() or
> would it be appropriate to throw a RuntimeException (since i cannot throw a
> checked Exception out of an implementation of javax.jms.MessageListener) ?
>
> On Thu, Aug 30, 2018 at 8:38 AM Robbie Gemmell <[hidden email]>
> wrote:
>
>> Tim and I had an initial look at this, and can see generally where the
>> broker is internally corrupting things on send, though not yet the
>> full picture how it gets there or what to do about it. The expiration
>> is likely to be key, one difference with the non-transacted case is
>> actually going to be because its using recover() which the client
>> performs locally.
>>
>> Robbie
>>
>> On Wed, 29 Aug 2018 at 06:23, Dan Langford <[hidden email]> wrote:
>>> ok i wrote 3 test files. I don't know the best way to get them to you
>>> easily. hopefully a Gist is ok.
>>> https://gist.github.com/danlangford/071e738225ec0c68dd470816b977499b
>>>
>>> you can copy those 3 files straight to
>>>
>> ./tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
>>> The test JMSTransactedRedeliveryTest::testAMQPProducerAMQPConsumer proves
>>> that a transacted client can .rollback() a handful of times and still be
>>> able to consume the redelivered message later on.
>>>
>>> The test JMSTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
>> shows
>>> that if a message had been expired and now a transacted client is
>>> attempting to consume it the client only has 2 chances before the broker
>>> starts sending the message in a way that will not parse correctly
>>>
>>> The test JMSNonTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
>> shows
>>> that if a message had been expired a non-transacted client has no
>> troubles
>>> reliably accessing the redelivered message from broker
>>>
>>>
>>> as you can tell i am mostly concerned about AMQP->AMQP for my use case.
>>> some of those other combos are failing some of these tests in other ways.
>>> naturally you can address those as you see fit but for my client the
>>> AMQP->AMQP is a roadblocker.
>>>
>>>
>>> let me know if you can determine why the broker is sending an extra null
>>> character in the payload on the third time the messages attempts
>> delivery.
>>> maybe we are doing something incorrectly.
>>>
>>>
>>> This has been more of an issue than i thought due to the fact that Spring
>>> default to enabling transactions. in all of my initial tests i couldn't
>>> reproduce it because i prefer the straight simplified jms api from 2.0
>> and
>>> that defaults to sessions not being transacted. that being said nearly
>> all
>>> of my clients prefer using Spring Boot autoconfigurer and other spring
>>> pieces which happen to default to transacted sessions.  i can now have
>> some
>>> of them workaround but others of them are requiring the transaction.
>>>
>>>
>>> also as a reminder and for context here is a link to the initial
>>> conversation i had with the Qpid Jms Client devs who pointed out to me
>> the
>>> erroneous null character in the message transfer from the broker:
>>> https://lists.apache.org/
>>> thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%
>> 3Cusers.qpid.apache.org%3E
>>>
>>> thank you so much for your time
>>>
>>> On Tue, Aug 14, 2018 at 1:19 PM Timothy Bish <[hidden email]>
>> wrote:
>>>> On 08/13/2018 07:12 PM, Dan Langford wrote:
>>>>> some of my users are attempting a pattern to deduplicate messages
>> based
>>>> on
>>>>> a time window instead of a fixed amount of space (a duplicate id
>> cache)
>>>>> so far the concept has been working very well. So they send their
>> AMQP
>>>>> messages (qpid-jms-client) into a Last Value Queue with an
>> appropriate
>>>>> identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the
>>>> message
>>>>> that is essentially the lag they will allow as they want to wait for
>>>>> possible duplicates. If any duplicates come in the Last Value Queue
>>>>> behavior is replacing the older message with the newer message until
>> the
>>>>> expiration. expired messages are delivered to the preconfigured
>> expiry
>>>>> queue where their application is listening. This is not perfect but
>> its
>>>> not
>>>>> intended to be. Its just intended to reduce additional unnecessary
>>>>> processing and they understand this is not a guarantee. It really
>> helps
>>>>> with a system that produces messages in a way that has flurries of
>>>>> "notifications" about the same assetID over and over again.
>>>>>
>>>>> BUT where we are seeing is a problem is when we are consuming from
>> the
>>>>> queue used to hold expired messages and we toss some exception and
>> the
>>>>> message needs to be redelivered. the first time or two the message is
>>>>> redelivered it is delivered OK. But when the JMSXDeliveryCount is
>> about 3
>>>>> or 4 (we use redelivery delay and multipliers to spread these out)
>> our
>>>>> qpid-jms-client stops being able to read the messages.
>>>>>
>>>>> we were only able to reproduce this when an AMQP message expired
>> onto the
>>>>> queue. (expired from a LVQ in case that is relevant). if we place the
>>>>> message directly on a queue and test different exception and
>> redelivery
>>>>> scenarios we cannot reproduce this behavior.
>>>>>
>>>>> i enable the qpid-jms-client frame logging (via env variable
>>>>> PN_TRACE_FRM=true) and i saw that in the situation when the client
>> code
>>>>> cannot access the payload, even though the broker WAS still sending
>> the
>>>>> payload. so i thought it was some odd issue with the client. The
>> Apache
>>>>> Qpid team responded that the issue seems to be that the broker
>> starts to
>>>>> send some ill formed payloads in this scenario. i dont want to
>> repeat the
>>>>> stack traces and their response, you can read those here
>>>>>
>>>>>
>> https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
>>>>> would it be helpful if i tested that this happens if there is not a
>> LVQ
>>>>> involved? i could have a message in a non-LVQ expire to another
>> queue and
>>>>> see if redeliveries over their get messed up after a few attempts.
>> For
>>>> the
>>>>> record this is AMQP for producing and consuming. i do notice the
>> messages
>>>>> waiting in the expiry queue have much more headers messages sent
>> directly
>>>>> to a queue from client code. they seem to be headers full of
>> information
>>>>> about the message as it left the previous queue. I tried to send a
>>>> message
>>>>> directly to the expiry queue with all these headers to determine if
>> it
>>>> was
>>>>> the existence of one of these specifically that trigger the malformed
>>>> frame
>>>>> but was not able to fully set all those headers. the JMSDeliverCount
>>>> (type
>>>>> Long) was the one that the client would not let me set and as a
>> result i
>>>>> could not test. for clarity thought i dont know that the issue
>> exists due
>>>>> to a header that is just what i saw as a difference between messages
>> be
>>>>> delivered to the queue by client code versus messages expiring from
>> one
>>>>> queue to another.
>>>>>
>>>>> please look over the linked thread on the qpid list and let me know
>> if
>>>> you
>>>>> know why a message transfer fram would become malformed after a few
>>>> failed
>>>>> deliveries only if the message expired onto the current queue.
>>>>>
>>>>> thanks so much
>>>>>
>>>> A great place to start is to create a unit test that reproduces the
>>>> issue.  You can look at the Artemis unit tests for AMQP to get some
>>>> inspiration on how to set one up.  Then try and create the smallest
>>>> possible test that can reproduce the issue to make it easier to narrow
>>>> in on where the issue might be.
>>>>
>>>> The AMQP tests in Artemis are located here:
>>>>
>>>>
>>>>
>> https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
>>>>
>>>>
>>>> --
>>>> Tim Bish
>>>>
>>>>

--
Tim Bish
twitter: @tabish121
blog: http://timbish.blogspot.com/

Reply | Threaded
Open this post in threaded view
|

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

Dan Langford
Thank you guys so much!
On Tue, Sep 11, 2018 at 2:50 PM Timothy Bish <[hidden email]> wrote:

> On 08/30/2018 07:10 PM, Dan Langford wrote:
> > thanks for looking into this. what is the proper way to force for
> testing a
> > redelivery that goes back to the broker without transactions? its
> probably
> > like killing the session or connection. that would be if we wanted to
> test
> > if non-transacted redeliveries were getting corrupted
>
> The issue has been identified and fixed, should appear in the next release.
> https://issues.apache.org/jira/browse/ARTEMIS-2082
>
> >
> > we have a lot of people using spring-jms and as i look in the spring
> code...
> >
> https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractMessageListenerContainer.java
> > looking at doExecuteListener and rollbackOnExceptionIfNecessary
> > it looks like if its transacted they try to rollback() and if its
> > client_ack they try recover(). they probably dont handle auto_ack because
> > that was acked immediately?
> >
> > speaking of this what is an appropriate way to NACK a message when using
> > Client Acknowledgemnt and a JMS messageHandler? is it simply to reach the
> > end of the method execution without having called message.acknowledge()
> or
> > would it be appropriate to throw a RuntimeException (since i cannot
> throw a
> > checked Exception out of an implementation of javax.jms.MessageListener)
> ?
> >
> > On Thu, Aug 30, 2018 at 8:38 AM Robbie Gemmell <[hidden email]
> >
> > wrote:
> >
> >> Tim and I had an initial look at this, and can see generally where the
> >> broker is internally corrupting things on send, though not yet the
> >> full picture how it gets there or what to do about it. The expiration
> >> is likely to be key, one difference with the non-transacted case is
> >> actually going to be because its using recover() which the client
> >> performs locally.
> >>
> >> Robbie
> >>
> >> On Wed, 29 Aug 2018 at 06:23, Dan Langford <[hidden email]>
> wrote:
> >>> ok i wrote 3 test files. I don't know the best way to get them to you
> >>> easily. hopefully a Gist is ok.
> >>> https://gist.github.com/danlangford/071e738225ec0c68dd470816b977499b
> >>>
> >>> you can copy those 3 files straight to
> >>>
> >>
> ./tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> >>> The test JMSTransactedRedeliveryTest::testAMQPProducerAMQPConsumer
> proves
> >>> that a transacted client can .rollback() a handful of times and still
> be
> >>> able to consume the redelivered message later on.
> >>>
> >>> The test JMSTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
> >> shows
> >>> that if a message had been expired and now a transacted client is
> >>> attempting to consume it the client only has 2 chances before the
> broker
> >>> starts sending the message in a way that will not parse correctly
> >>>
> >>> The test
> JMSNonTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
> >> shows
> >>> that if a message had been expired a non-transacted client has no
> >> troubles
> >>> reliably accessing the redelivered message from broker
> >>>
> >>>
> >>> as you can tell i am mostly concerned about AMQP->AMQP for my use case.
> >>> some of those other combos are failing some of these tests in other
> ways.
> >>> naturally you can address those as you see fit but for my client the
> >>> AMQP->AMQP is a roadblocker.
> >>>
> >>>
> >>> let me know if you can determine why the broker is sending an extra
> null
> >>> character in the payload on the third time the messages attempts
> >> delivery.
> >>> maybe we are doing something incorrectly.
> >>>
> >>>
> >>> This has been more of an issue than i thought due to the fact that
> Spring
> >>> default to enabling transactions. in all of my initial tests i couldn't
> >>> reproduce it because i prefer the straight simplified jms api from 2.0
> >> and
> >>> that defaults to sessions not being transacted. that being said nearly
> >> all
> >>> of my clients prefer using Spring Boot autoconfigurer and other spring
> >>> pieces which happen to default to transacted sessions.  i can now have
> >> some
> >>> of them workaround but others of them are requiring the transaction.
> >>>
> >>>
> >>> also as a reminder and for context here is a link to the initial
> >>> conversation i had with the Qpid Jms Client devs who pointed out to me
> >> the
> >>> erroneous null character in the message transfer from the broker:
> >>> https://lists.apache.org/
> >>> thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%
> >> 3Cusers.qpid.apache.org%3E
> >>>
> >>> thank you so much for your time
> >>>
> >>> On Tue, Aug 14, 2018 at 1:19 PM Timothy Bish <[hidden email]>
> >> wrote:
> >>>> On 08/13/2018 07:12 PM, Dan Langford wrote:
> >>>>> some of my users are attempting a pattern to deduplicate messages
> >> based
> >>>> on
> >>>>> a time window instead of a fixed amount of space (a duplicate id
> >> cache)
> >>>>> so far the concept has been working very well. So they send their
> >> AMQP
> >>>>> messages (qpid-jms-client) into a Last Value Queue with an
> >> appropriate
> >>>>> identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the
> >>>> message
> >>>>> that is essentially the lag they will allow as they want to wait for
> >>>>> possible duplicates. If any duplicates come in the Last Value Queue
> >>>>> behavior is replacing the older message with the newer message until
> >> the
> >>>>> expiration. expired messages are delivered to the preconfigured
> >> expiry
> >>>>> queue where their application is listening. This is not perfect but
> >> its
> >>>> not
> >>>>> intended to be. Its just intended to reduce additional unnecessary
> >>>>> processing and they understand this is not a guarantee. It really
> >> helps
> >>>>> with a system that produces messages in a way that has flurries of
> >>>>> "notifications" about the same assetID over and over again.
> >>>>>
> >>>>> BUT where we are seeing is a problem is when we are consuming from
> >> the
> >>>>> queue used to hold expired messages and we toss some exception and
> >> the
> >>>>> message needs to be redelivered. the first time or two the message is
> >>>>> redelivered it is delivered OK. But when the JMSXDeliveryCount is
> >> about 3
> >>>>> or 4 (we use redelivery delay and multipliers to spread these out)
> >> our
> >>>>> qpid-jms-client stops being able to read the messages.
> >>>>>
> >>>>> we were only able to reproduce this when an AMQP message expired
> >> onto the
> >>>>> queue. (expired from a LVQ in case that is relevant). if we place the
> >>>>> message directly on a queue and test different exception and
> >> redelivery
> >>>>> scenarios we cannot reproduce this behavior.
> >>>>>
> >>>>> i enable the qpid-jms-client frame logging (via env variable
> >>>>> PN_TRACE_FRM=true) and i saw that in the situation when the client
> >> code
> >>>>> cannot access the payload, even though the broker WAS still sending
> >> the
> >>>>> payload. so i thought it was some odd issue with the client. The
> >> Apache
> >>>>> Qpid team responded that the issue seems to be that the broker
> >> starts to
> >>>>> send some ill formed payloads in this scenario. i dont want to
> >> repeat the
> >>>>> stack traces and their response, you can read those here
> >>>>>
> >>>>>
> >>
> https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
> >>>>> would it be helpful if i tested that this happens if there is not a
> >> LVQ
> >>>>> involved? i could have a message in a non-LVQ expire to another
> >> queue and
> >>>>> see if redeliveries over their get messed up after a few attempts.
> >> For
> >>>> the
> >>>>> record this is AMQP for producing and consuming. i do notice the
> >> messages
> >>>>> waiting in the expiry queue have much more headers messages sent
> >> directly
> >>>>> to a queue from client code. they seem to be headers full of
> >> information
> >>>>> about the message as it left the previous queue. I tried to send a
> >>>> message
> >>>>> directly to the expiry queue with all these headers to determine if
> >> it
> >>>> was
> >>>>> the existence of one of these specifically that trigger the malformed
> >>>> frame
> >>>>> but was not able to fully set all those headers. the JMSDeliverCount
> >>>> (type
> >>>>> Long) was the one that the client would not let me set and as a
> >> result i
> >>>>> could not test. for clarity thought i dont know that the issue
> >> exists due
> >>>>> to a header that is just what i saw as a difference between messages
> >> be
> >>>>> delivered to the queue by client code versus messages expiring from
> >> one
> >>>>> queue to another.
> >>>>>
> >>>>> please look over the linked thread on the qpid list and let me know
> >> if
> >>>> you
> >>>>> know why a message transfer fram would become malformed after a few
> >>>> failed
> >>>>> deliveries only if the message expired onto the current queue.
> >>>>>
> >>>>> thanks so much
> >>>>>
> >>>> A great place to start is to create a unit test that reproduces the
> >>>> issue.  You can look at the Artemis unit tests for AMQP to get some
> >>>> inspiration on how to set one up.  Then try and create the smallest
> >>>> possible test that can reproduce the issue to make it easier to narrow
> >>>> in on where the issue might be.
> >>>>
> >>>> The AMQP tests in Artemis are located here:
> >>>>
> >>>>
> >>>>
> >>
> https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> >>>>
> >>>>
> >>>> --
> >>>> Tim Bish
> >>>>
> >>>>
>
> --
> Tim Bish
> twitter: @tabish121
> blog: http://timbish.blogspot.com/
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

Dan Langford
Just for the record: even though we could easily recreate this problem
using transactions, and our initial “workaround” involved not using
transactions, we have recently seen this issue when no transactions where
involved.

Reading through the Jira description it looks like this is not specifically
about transactions so I am hopeful that the fix will address even our
current issues.

Since our users are now focusing on this again I will probably build from
master to test the fix



On Tue, Sep 11, 2018 at 4:46 PM Dan Langford <[hidden email]> wrote:

> Thank you guys so much!
> On Tue, Sep 11, 2018 at 2:50 PM Timothy Bish <[hidden email]> wrote:
>
>> On 08/30/2018 07:10 PM, Dan Langford wrote:
>> > thanks for looking into this. what is the proper way to force for
>> testing a
>> > redelivery that goes back to the broker without transactions? its
>> probably
>> > like killing the session or connection. that would be if we wanted to
>> test
>> > if non-transacted redeliveries were getting corrupted
>>
>> The issue has been identified and fixed, should appear in the next
>> release.
>> https://issues.apache.org/jira/browse/ARTEMIS-2082
>>
>> >
>> > we have a lot of people using spring-jms and as i look in the spring
>> code...
>> >
>> https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractMessageListenerContainer.java
>> > looking at doExecuteListener and rollbackOnExceptionIfNecessary
>> > it looks like if its transacted they try to rollback() and if its
>> > client_ack they try recover(). they probably dont handle auto_ack
>> because
>> > that was acked immediately?
>> >
>> > speaking of this what is an appropriate way to NACK a message when using
>> > Client Acknowledgemnt and a JMS messageHandler? is it simply to reach
>> the
>> > end of the method execution without having called message.acknowledge()
>> or
>> > would it be appropriate to throw a RuntimeException (since i cannot
>> throw a
>> > checked Exception out of an implementation of
>> javax.jms.MessageListener) ?
>> >
>> > On Thu, Aug 30, 2018 at 8:38 AM Robbie Gemmell <
>> [hidden email]>
>> > wrote:
>> >
>> >> Tim and I had an initial look at this, and can see generally where the
>> >> broker is internally corrupting things on send, though not yet the
>> >> full picture how it gets there or what to do about it. The expiration
>> >> is likely to be key, one difference with the non-transacted case is
>> >> actually going to be because its using recover() which the client
>> >> performs locally.
>> >>
>> >> Robbie
>> >>
>> >> On Wed, 29 Aug 2018 at 06:23, Dan Langford <[hidden email]>
>> wrote:
>> >>> ok i wrote 3 test files. I don't know the best way to get them to you
>> >>> easily. hopefully a Gist is ok.
>> >>> https://gist.github.com/danlangford/071e738225ec0c68dd470816b977499b
>> >>>
>> >>> you can copy those 3 files straight to
>> >>>
>> >>
>> ./tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
>> >>> The test JMSTransactedRedeliveryTest::testAMQPProducerAMQPConsumer
>> proves
>> >>> that a transacted client can .rollback() a handful of times and still
>> be
>> >>> able to consume the redelivered message later on.
>> >>>
>> >>> The test JMSTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
>> >> shows
>> >>> that if a message had been expired and now a transacted client is
>> >>> attempting to consume it the client only has 2 chances before the
>> broker
>> >>> starts sending the message in a way that will not parse correctly
>> >>>
>> >>> The test
>> JMSNonTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
>> >> shows
>> >>> that if a message had been expired a non-transacted client has no
>> >> troubles
>> >>> reliably accessing the redelivered message from broker
>> >>>
>> >>>
>> >>> as you can tell i am mostly concerned about AMQP->AMQP for my use
>> case.
>> >>> some of those other combos are failing some of these tests in other
>> ways.
>> >>> naturally you can address those as you see fit but for my client the
>> >>> AMQP->AMQP is a roadblocker.
>> >>>
>> >>>
>> >>> let me know if you can determine why the broker is sending an extra
>> null
>> >>> character in the payload on the third time the messages attempts
>> >> delivery.
>> >>> maybe we are doing something incorrectly.
>> >>>
>> >>>
>> >>> This has been more of an issue than i thought due to the fact that
>> Spring
>> >>> default to enabling transactions. in all of my initial tests i
>> couldn't
>> >>> reproduce it because i prefer the straight simplified jms api from 2.0
>> >> and
>> >>> that defaults to sessions not being transacted. that being said nearly
>> >> all
>> >>> of my clients prefer using Spring Boot autoconfigurer and other spring
>> >>> pieces which happen to default to transacted sessions.  i can now have
>> >> some
>> >>> of them workaround but others of them are requiring the transaction.
>> >>>
>> >>>
>> >>> also as a reminder and for context here is a link to the initial
>> >>> conversation i had with the Qpid Jms Client devs who pointed out to me
>> >> the
>> >>> erroneous null character in the message transfer from the broker:
>> >>> https://lists.apache.org/
>> >>> thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@
>> %
>> >> 3Cusers.qpid.apache.org%3E
>> >>>
>> >>> thank you so much for your time
>> >>>
>> >>> On Tue, Aug 14, 2018 at 1:19 PM Timothy Bish <[hidden email]>
>> >> wrote:
>> >>>> On 08/13/2018 07:12 PM, Dan Langford wrote:
>> >>>>> some of my users are attempting a pattern to deduplicate messages
>> >> based
>> >>>> on
>> >>>>> a time window instead of a fixed amount of space (a duplicate id
>> >> cache)
>> >>>>> so far the concept has been working very well. So they send their
>> >> AMQP
>> >>>>> messages (qpid-jms-client) into a Last Value Queue with an
>> >> appropriate
>> >>>>> identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the
>> >>>> message
>> >>>>> that is essentially the lag they will allow as they want to wait for
>> >>>>> possible duplicates. If any duplicates come in the Last Value Queue
>> >>>>> behavior is replacing the older message with the newer message until
>> >> the
>> >>>>> expiration. expired messages are delivered to the preconfigured
>> >> expiry
>> >>>>> queue where their application is listening. This is not perfect but
>> >> its
>> >>>> not
>> >>>>> intended to be. Its just intended to reduce additional unnecessary
>> >>>>> processing and they understand this is not a guarantee. It really
>> >> helps
>> >>>>> with a system that produces messages in a way that has flurries of
>> >>>>> "notifications" about the same assetID over and over again.
>> >>>>>
>> >>>>> BUT where we are seeing is a problem is when we are consuming from
>> >> the
>> >>>>> queue used to hold expired messages and we toss some exception and
>> >> the
>> >>>>> message needs to be redelivered. the first time or two the message
>> is
>> >>>>> redelivered it is delivered OK. But when the JMSXDeliveryCount is
>> >> about 3
>> >>>>> or 4 (we use redelivery delay and multipliers to spread these out)
>> >> our
>> >>>>> qpid-jms-client stops being able to read the messages.
>> >>>>>
>> >>>>> we were only able to reproduce this when an AMQP message expired
>> >> onto the
>> >>>>> queue. (expired from a LVQ in case that is relevant). if we place
>> the
>> >>>>> message directly on a queue and test different exception and
>> >> redelivery
>> >>>>> scenarios we cannot reproduce this behavior.
>> >>>>>
>> >>>>> i enable the qpid-jms-client frame logging (via env variable
>> >>>>> PN_TRACE_FRM=true) and i saw that in the situation when the client
>> >> code
>> >>>>> cannot access the payload, even though the broker WAS still sending
>> >> the
>> >>>>> payload. so i thought it was some odd issue with the client. The
>> >> Apache
>> >>>>> Qpid team responded that the issue seems to be that the broker
>> >> starts to
>> >>>>> send some ill formed payloads in this scenario. i dont want to
>> >> repeat the
>> >>>>> stack traces and their response, you can read those here
>> >>>>>
>> >>>>>
>> >>
>> https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
>> >>>>> would it be helpful if i tested that this happens if there is not a
>> >> LVQ
>> >>>>> involved? i could have a message in a non-LVQ expire to another
>> >> queue and
>> >>>>> see if redeliveries over their get messed up after a few attempts.
>> >> For
>> >>>> the
>> >>>>> record this is AMQP for producing and consuming. i do notice the
>> >> messages
>> >>>>> waiting in the expiry queue have much more headers messages sent
>> >> directly
>> >>>>> to a queue from client code. they seem to be headers full of
>> >> information
>> >>>>> about the message as it left the previous queue. I tried to send a
>> >>>> message
>> >>>>> directly to the expiry queue with all these headers to determine if
>> >> it
>> >>>> was
>> >>>>> the existence of one of these specifically that trigger the
>> malformed
>> >>>> frame
>> >>>>> but was not able to fully set all those headers. the JMSDeliverCount
>> >>>> (type
>> >>>>> Long) was the one that the client would not let me set and as a
>> >> result i
>> >>>>> could not test. for clarity thought i dont know that the issue
>> >> exists due
>> >>>>> to a header that is just what i saw as a difference between messages
>> >> be
>> >>>>> delivered to the queue by client code versus messages expiring from
>> >> one
>> >>>>> queue to another.
>> >>>>>
>> >>>>> please look over the linked thread on the qpid list and let me know
>> >> if
>> >>>> you
>> >>>>> know why a message transfer fram would become malformed after a few
>> >>>> failed
>> >>>>> deliveries only if the message expired onto the current queue.
>> >>>>>
>> >>>>> thanks so much
>> >>>>>
>> >>>> A great place to start is to create a unit test that reproduces the
>> >>>> issue.  You can look at the Artemis unit tests for AMQP to get some
>> >>>> inspiration on how to set one up.  Then try and create the smallest
>> >>>> possible test that can reproduce the issue to make it easier to
>> narrow
>> >>>> in on where the issue might be.
>> >>>>
>> >>>> The AMQP tests in Artemis are located here:
>> >>>>
>> >>>>
>> >>>>
>> >>
>> https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Tim Bish
>> >>>>
>> >>>>
>>
>> --
>> Tim Bish
>> twitter: @tabish121
>> blog: http://timbish.blogspot.com/
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing

Robbie Gemmell
To confirm, the broker issue had nothing to do with use of
transactions specifically. You only saw any variation there due to
difference in client behaviour altering the overall situation in the
limited test.

Master and 2.6.x now have some fairly different message handling in
this area, so you should try whichever is more appropriate to you.

On Sun, 14 Oct 2018 at 17:41, Dan Langford <[hidden email]> wrote:

>
> Just for the record: even though we could easily recreate this problem
> using transactions, and our initial “workaround” involved not using
> transactions, we have recently seen this issue when no transactions where
> involved.
>
> Reading through the Jira description it looks like this is not specifically
> about transactions so I am hopeful that the fix will address even our
> current issues.
>
> Since our users are now focusing on this again I will probably build from
> master to test the fix
>
>
>
> On Tue, Sep 11, 2018 at 4:46 PM Dan Langford <[hidden email]> wrote:
>
> > Thank you guys so much!
> > On Tue, Sep 11, 2018 at 2:50 PM Timothy Bish <[hidden email]> wrote:
> >
> >> On 08/30/2018 07:10 PM, Dan Langford wrote:
> >> > thanks for looking into this. what is the proper way to force for
> >> testing a
> >> > redelivery that goes back to the broker without transactions? its
> >> probably
> >> > like killing the session or connection. that would be if we wanted to
> >> test
> >> > if non-transacted redeliveries were getting corrupted
> >>
> >> The issue has been identified and fixed, should appear in the next
> >> release.
> >> https://issues.apache.org/jira/browse/ARTEMIS-2082
> >>
> >> >
> >> > we have a lot of people using spring-jms and as i look in the spring
> >> code...
> >> >
> >> https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractMessageListenerContainer.java
> >> > looking at doExecuteListener and rollbackOnExceptionIfNecessary
> >> > it looks like if its transacted they try to rollback() and if its
> >> > client_ack they try recover(). they probably dont handle auto_ack
> >> because
> >> > that was acked immediately?
> >> >
> >> > speaking of this what is an appropriate way to NACK a message when using
> >> > Client Acknowledgemnt and a JMS messageHandler? is it simply to reach
> >> the
> >> > end of the method execution without having called message.acknowledge()
> >> or
> >> > would it be appropriate to throw a RuntimeException (since i cannot
> >> throw a
> >> > checked Exception out of an implementation of
> >> javax.jms.MessageListener) ?
> >> >
> >> > On Thu, Aug 30, 2018 at 8:38 AM Robbie Gemmell <
> >> [hidden email]>
> >> > wrote:
> >> >
> >> >> Tim and I had an initial look at this, and can see generally where the
> >> >> broker is internally corrupting things on send, though not yet the
> >> >> full picture how it gets there or what to do about it. The expiration
> >> >> is likely to be key, one difference with the non-transacted case is
> >> >> actually going to be because its using recover() which the client
> >> >> performs locally.
> >> >>
> >> >> Robbie
> >> >>
> >> >> On Wed, 29 Aug 2018 at 06:23, Dan Langford <[hidden email]>
> >> wrote:
> >> >>> ok i wrote 3 test files. I don't know the best way to get them to you
> >> >>> easily. hopefully a Gist is ok.
> >> >>> https://gist.github.com/danlangford/071e738225ec0c68dd470816b977499b
> >> >>>
> >> >>> you can copy those 3 files straight to
> >> >>>
> >> >>
> >> ./tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> >> >>> The test JMSTransactedRedeliveryTest::testAMQPProducerAMQPConsumer
> >> proves
> >> >>> that a transacted client can .rollback() a handful of times and still
> >> be
> >> >>> able to consume the redelivered message later on.
> >> >>>
> >> >>> The test JMSTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
> >> >> shows
> >> >>> that if a message had been expired and now a transacted client is
> >> >>> attempting to consume it the client only has 2 chances before the
> >> broker
> >> >>> starts sending the message in a way that will not parse correctly
> >> >>>
> >> >>> The test
> >> JMSNonTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
> >> >> shows
> >> >>> that if a message had been expired a non-transacted client has no
> >> >> troubles
> >> >>> reliably accessing the redelivered message from broker
> >> >>>
> >> >>>
> >> >>> as you can tell i am mostly concerned about AMQP->AMQP for my use
> >> case.
> >> >>> some of those other combos are failing some of these tests in other
> >> ways.
> >> >>> naturally you can address those as you see fit but for my client the
> >> >>> AMQP->AMQP is a roadblocker.
> >> >>>
> >> >>>
> >> >>> let me know if you can determine why the broker is sending an extra
> >> null
> >> >>> character in the payload on the third time the messages attempts
> >> >> delivery.
> >> >>> maybe we are doing something incorrectly.
> >> >>>
> >> >>>
> >> >>> This has been more of an issue than i thought due to the fact that
> >> Spring
> >> >>> default to enabling transactions. in all of my initial tests i
> >> couldn't
> >> >>> reproduce it because i prefer the straight simplified jms api from 2.0
> >> >> and
> >> >>> that defaults to sessions not being transacted. that being said nearly
> >> >> all
> >> >>> of my clients prefer using Spring Boot autoconfigurer and other spring
> >> >>> pieces which happen to default to transacted sessions.  i can now have
> >> >> some
> >> >>> of them workaround but others of them are requiring the transaction.
> >> >>>
> >> >>>
> >> >>> also as a reminder and for context here is a link to the initial
> >> >>> conversation i had with the Qpid Jms Client devs who pointed out to me
> >> >> the
> >> >>> erroneous null character in the message transfer from the broker:
> >> >>> https://lists.apache.org/
> >> >>> thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@
> >> %
> >> >> 3Cusers.qpid.apache.org%3E
> >> >>>
> >> >>> thank you so much for your time
> >> >>>
> >> >>> On Tue, Aug 14, 2018 at 1:19 PM Timothy Bish <[hidden email]>
> >> >> wrote:
> >> >>>> On 08/13/2018 07:12 PM, Dan Langford wrote:
> >> >>>>> some of my users are attempting a pattern to deduplicate messages
> >> >> based
> >> >>>> on
> >> >>>>> a time window instead of a fixed amount of space (a duplicate id
> >> >> cache)
> >> >>>>> so far the concept has been working very well. So they send their
> >> >> AMQP
> >> >>>>> messages (qpid-jms-client) into a Last Value Queue with an
> >> >> appropriate
> >> >>>>> identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the
> >> >>>> message
> >> >>>>> that is essentially the lag they will allow as they want to wait for
> >> >>>>> possible duplicates. If any duplicates come in the Last Value Queue
> >> >>>>> behavior is replacing the older message with the newer message until
> >> >> the
> >> >>>>> expiration. expired messages are delivered to the preconfigured
> >> >> expiry
> >> >>>>> queue where their application is listening. This is not perfect but
> >> >> its
> >> >>>> not
> >> >>>>> intended to be. Its just intended to reduce additional unnecessary
> >> >>>>> processing and they understand this is not a guarantee. It really
> >> >> helps
> >> >>>>> with a system that produces messages in a way that has flurries of
> >> >>>>> "notifications" about the same assetID over and over again.
> >> >>>>>
> >> >>>>> BUT where we are seeing is a problem is when we are consuming from
> >> >> the
> >> >>>>> queue used to hold expired messages and we toss some exception and
> >> >> the
> >> >>>>> message needs to be redelivered. the first time or two the message
> >> is
> >> >>>>> redelivered it is delivered OK. But when the JMSXDeliveryCount is
> >> >> about 3
> >> >>>>> or 4 (we use redelivery delay and multipliers to spread these out)
> >> >> our
> >> >>>>> qpid-jms-client stops being able to read the messages.
> >> >>>>>
> >> >>>>> we were only able to reproduce this when an AMQP message expired
> >> >> onto the
> >> >>>>> queue. (expired from a LVQ in case that is relevant). if we place
> >> the
> >> >>>>> message directly on a queue and test different exception and
> >> >> redelivery
> >> >>>>> scenarios we cannot reproduce this behavior.
> >> >>>>>
> >> >>>>> i enable the qpid-jms-client frame logging (via env variable
> >> >>>>> PN_TRACE_FRM=true) and i saw that in the situation when the client
> >> >> code
> >> >>>>> cannot access the payload, even though the broker WAS still sending
> >> >> the
> >> >>>>> payload. so i thought it was some odd issue with the client. The
> >> >> Apache
> >> >>>>> Qpid team responded that the issue seems to be that the broker
> >> >> starts to
> >> >>>>> send some ill formed payloads in this scenario. i dont want to
> >> >> repeat the
> >> >>>>> stack traces and their response, you can read those here
> >> >>>>>
> >> >>>>>
> >> >>
> >> https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
> >> >>>>> would it be helpful if i tested that this happens if there is not a
> >> >> LVQ
> >> >>>>> involved? i could have a message in a non-LVQ expire to another
> >> >> queue and
> >> >>>>> see if redeliveries over their get messed up after a few attempts.
> >> >> For
> >> >>>> the
> >> >>>>> record this is AMQP for producing and consuming. i do notice the
> >> >> messages
> >> >>>>> waiting in the expiry queue have much more headers messages sent
> >> >> directly
> >> >>>>> to a queue from client code. they seem to be headers full of
> >> >> information
> >> >>>>> about the message as it left the previous queue. I tried to send a
> >> >>>> message
> >> >>>>> directly to the expiry queue with all these headers to determine if
> >> >> it
> >> >>>> was
> >> >>>>> the existence of one of these specifically that trigger the
> >> malformed
> >> >>>> frame
> >> >>>>> but was not able to fully set all those headers. the JMSDeliverCount
> >> >>>> (type
> >> >>>>> Long) was the one that the client would not let me set and as a
> >> >> result i
> >> >>>>> could not test. for clarity thought i dont know that the issue
> >> >> exists due
> >> >>>>> to a header that is just what i saw as a difference between messages
> >> >> be
> >> >>>>> delivered to the queue by client code versus messages expiring from
> >> >> one
> >> >>>>> queue to another.
> >> >>>>>
> >> >>>>> please look over the linked thread on the qpid list and let me know
> >> >> if
> >> >>>> you
> >> >>>>> know why a message transfer fram would become malformed after a few
> >> >>>> failed
> >> >>>>> deliveries only if the message expired onto the current queue.
> >> >>>>>
> >> >>>>> thanks so much
> >> >>>>>
> >> >>>> A great place to start is to create a unit test that reproduces the
> >> >>>> issue.  You can look at the Artemis unit tests for AMQP to get some
> >> >>>> inspiration on how to set one up.  Then try and create the smallest
> >> >>>> possible test that can reproduce the issue to make it easier to
> >> narrow
> >> >>>> in on where the issue might be.
> >> >>>>
> >> >>>> The AMQP tests in Artemis are located here:
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>
> >> https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Tim Bish
> >> >>>>
> >> >>>>
> >>
> >> --
> >> Tim Bish
> >> twitter: @tabish121
> >> blog: http://timbish.blogspot.com/
> >>
> >>