Message loss in network of brokers - transactional send

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Message loss in network of brokers - transactional send

oseymen
I seem to be losing some messages in a network of brokers topology in the following setup:

- 1 producer pushing messages to a persistent queue (let’s call it “IN.QUEUE.Broker1”) on one broker (broker1) inside a transaction. Producer commits in every 100 messages.

- IN.QUEUE is a composite queue which forwards the messages to another queue in broker1 (let’s call this “Q1”). Network of brokers is configured to forward messages from Q1 to broker2. Q1 is listed in staticallyIncludedDestinations so it will always forward.

- There are no consumers. I am looking at the message counts from AMQ web interface once all messages are in.

- Second broker (broker2) is connected to broker1 via network of brokers (configuration below).

My test case:
1. Purge all messages in all queues
2. Start producer and send 10,000 messages to IN.QUEUE in broker1. At this point, I start to see messages forwarded to broker2.
3. Stop/kill broker2. I am running the brokers in console. So in order to kill the broker I just kill the console. At this point I am seeing the messages accumulated in broker1.
4. Start broker2.
5. When all 10,000 is sent, I look at the total number of messages in each broker and they are same. I just run the test again and I have 9981 messages in broker2 instead of 10,000. I can replicate this whenever I run this test.

I also looked at https://issues.apache.org/jira/browse/AMQ-1845. My issue seems to be similar with this. So I converted my code from Spring jms template to Apache NMS. But issue is still there.

Can you please advise whether this is a known issue or something is wrong in my configuration?

CONFIGURATION in BROKER1:

<destinationInterceptors>
  <virtualDestinationInterceptor>
        <virtualDestinations>
          <compositeQueue name="IN.Broker1">
                <forwardTo>
                  <queue physicalName="Q1" />
                </forwardTo>
          </compositeQueue>
        </virtualDestinations>
  </virtualDestinationInterceptor>
</destinationInterceptors>

<networkConnectors>
  <networkConnector
                uri="static:(tcp://localhost:61617)"
                name="FromB1ToB2"
                conduitSubscriptions="false"
                decreaseNetworkConsumerPriority="false"
                prefetchSize="1"
                >
        <staticallyIncludedDestinations>
                <queue physicalName="Q1" />
        </staticallyIncludedDestinations> 
  </networkConnector>
</networkConnectors>

PRODUCER CODE:

var textMessage = GetMessageContent();
var connectionFactory = new ConnectionFactory(ServerUri);

using (var connection = connectionFactory.CreateConnection())
{
        connection.Start();

        using (var session = connection.CreateSession(AcknowledgementMode.Transactional))
        {
                var queue = session.GetQueue(destination);
                using (var producer = session.CreateProducer(queue))
                {
                        producer.DeliveryMode = MsgDeliveryMode.Persistent;

                        for (int i = 1; i < numberOfMessagesToSend+1; i++)
                        {
                                var message = producer.CreateTextMessage(textMessage);
                                producer.Send(message);

                                if (i % 100 == 0)
                                {
                                        session.Commit();
                                }
                        }
                }
        }
}

Cheers,
Ozan
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
Hi,

Based on below test and my further tests where I kill the target broker during store-and-forward operation, I am seeing messages lost/dropped.

Basically this problem occurs when messages are sent to a queue on one broker which is configured to be "staticallyincluded" to forward messages to another broker, and if target broker is killed during message forwarding operation.

It is as if store-and-forward is removing the message from the queue before making sure that the message is persisted in the target broker.

Do you think this is so because of transactional send from the producer to the original queue on the first broker?

Can you advise which configuration options I should set in order to achieve reliable delivery in network of brokers please?

Thanks,
Ozan
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
Here is some more info:

I enabled debug logging and can see where the problem is (however I still don't know what the solution is).

I ran my test again and send 5000 messages to one broker which is configured to store-and-forward the message to broker2. Two brokers are connected via network of brokers.

After killing brokers randomly, I ended up 4997 messages in broker2. I wrote a consumer that will consume all messages from broker2 and create a spreadsheet with all properties. Using this spreadsheet I can pinpoint which messages have failed using sequential ids. One of the messages that has failed is #3751.

Looking at the debug log I can see:

(line 976) bridging (broker1 -> broker2) messageId = ID:HAM-NB-073-56843-634438260819794239-1:0:1:1:3748
(line 979) bridging (broker1 -> broker2) messageId = ID:HAM-NB-073-56843-634438260819794239-1:0:1:1:3749
(line 983) bridging (broker1 -> broker2) messageId = ID:HAM-NB-073-56843-634438260819794239-1:0:1:1:3750

Then broker1 is killed.
When it comes back, it says:
(line 1311) last stored sequence id set: 3751
(line 1312) suppressing duplicate message send [ID:HAM-NB-073-56843-634438260819794239-1:0:1:1:3751] with producerSequenceId [3751] less than last stored: 3751

There is not message sent information in the log for 3751!!!

I've attached the full log for your perusal. I'd appreciate any help to solve this problem.

This test was done with apache-activemq-5.4.2-fuse-02-00.

activemq.log.4

Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

gtully
if you can reproduce this issue with the current 5.5 release, can you
raise a jira issue to track this?
thanks.

On 16 June 2011 13:45, oseymen <[hidden email]> wrote:

> Here is some more info:
>
> I enabled debug logging and can see where the problem is (however I still
> don't know what the solution is).
>
> I ran my test again and send 5000 messages to one broker which is configured
> to store-and-forward the message to broker2. Two brokers are connected via
> network of brokers.
>
> After killing brokers randomly, I ended up 4997 messages in broker2. I wrote
> a consumer that will consume all messages from broker2 and create a
> spreadsheet with all properties. Using this spreadsheet I can pinpoint which
> messages have failed using sequential ids. One of the messages that has
> failed is #3751.
>
> Looking at the debug log I can see:
>
> (line 976) bridging (broker1 -> broker2) messageId =
> ID:HAM-NB-073-56843-634438260819794239-1:0:1:1:3748
> (line 979) bridging (broker1 -> broker2) messageId =
> ID:HAM-NB-073-56843-634438260819794239-1:0:1:1:3749
> (line 983) bridging (broker1 -> broker2) messageId =
> ID:HAM-NB-073-56843-634438260819794239-1:0:1:1:3750
>
> Then broker1 is killed.
> When it comes back, it says:
> (line 1311) last stored sequence id set: 3751
> (line 1312) suppressing duplicate message send
> [ID:HAM-NB-073-56843-634438260819794239-1:0:1:1:3751] with
> producerSequenceId [3751] less than last stored: 3751
>
> There is not message sent information in the log for 3751!!!
>
> I've attached the full log for your perusal. I'd appreciate any help to
> solve this problem.
>
> This test was done with apache-activemq-5.4.2-fuse-02-00.
>
> http://activemq.2283324.n4.nabble.com/file/n3602343/activemq.log.4
> activemq.log.4
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Message-loss-in-network-of-brokers-transactional-send-tp3588714p3602343.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



--
http://fusesource.com
http://blog.garytully.com
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
Thanks Gary.

I've tested below with Fuse 5.5 (apache-activemq-5.5.0-fuse-00-27) and unfortunately the issue is there. However what it boils down to is this "suppressing duplicate message send" message in composite destinations.

In order to prove this is the case, I started eliminating components one by one. I removed network-of-brokers from my setup. I took the vanilla (default) activemq.xml from the distribution and setup a virtual topic with 3 queues. I started sending 3000 messages with my producer which simply sends messages in auto-acknowledge mode using failover transport. While producer is running, I killed and restarted  activemq multiple times (killed: prematurely. Just close the console window in which AMQ was running).

I am seeing 3000 in one queue and less messages (~2998) on other queues. In the log file I have DEBUG statements saying "suppressing duplicate message send...". So after AMQ restart, AMQ is suppressing the message to other queues thinking that it is duplicate by looking at the last stored sequence id. In this case last stored sequence id is correct but this stops message propagation to other queues in composite destination setup.

I searched this in Google and spotted Gary's comment on https://issues.apache.org/jira/browse/AMQ-2800. Gary says "duplicate message sends can occur with the non transactional use of the failover: transport. It can happen if a send is in progress when a failover reconnection occurs back to the same broker (as if there was a temp network partition) and the send reply is not received. A non transactional client will resend the message which needs to be suppressed by the audit". He also recommends disabling audit. However this is for JDBCMessageStore as far as I understand and not for KahaDB.

I am sure that this is the reason for my problems as well. I really couldn't understand how to disable audit for KahaDB but looking at the configuration schema, I applied following changes to the config but none of them fixed the problem:
1. enableAudit="false" to PolicyEntry for all topics and queues.
2. maxProducersToAudit="0" to PolicyEntry for all topics and queues.

Does anyone have any ideas on how to fix this? If not, I will raise a JIRA.

Thanks,
Ozan
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

gtully
that is a problem, if broker death occurs between the first send to a
composite destination and the last send (all of which occur on the
broker), the resend will be suppressed but there is no guarantee that
each of the composite dests got the message.

The duplicate suppression is not aware of composite destinations.
It should be possible to disable the duplicate suppression on the
persistence adapter using
<kahaDB maxFailoverProducersToTrack="0" />

Also, I think a send transaction would help, it should encompass all
of the composite destinations.

On 13 July 2011 09:38, oseymen <[hidden email]> wrote:

> Thanks Gary.
>
> I've tested below with Fuse 5.5 (apache-activemq-5.5.0-fuse-00-27) and
> unfortunately the issue is there. However what it boils down to is this
> "suppressing duplicate message send" message in composite destinations.
>
> In order to prove this is the case, I started eliminating components one by
> one. I removed network-of-brokers from my setup. I took the vanilla
> (default) activemq.xml from the distribution and setup a virtual topic with
> 3 queues. I started sending 3000 messages with my producer which simply
> sends messages in auto-acknowledge mode using failover transport. While
> producer is running, I killed and restarted  activemq multiple times
> (killed: prematurely. Just close the console window in which AMQ was
> running).
>
> I am seeing 3000 in one queue and less messages (~2998) on other queues. In
> the log file I have DEBUG statements saying "suppressing duplicate message
> send...". So after AMQ restart, AMQ is suppressing the message to other
> queues thinking that it is duplicate by looking at the last stored sequence
> id. In this case last stored sequence id is correct but this stops message
> propagation to other queues in composite destination setup.
>
> I searched this in Google and spotted Gary's comment on
> https://issues.apache.org/jira/browse/AMQ-2800. Gary says "duplicate message
> sends can occur with the non transactional use of the failover: transport.
> It can happen if a send is in progress when a failover reconnection occurs
> back to the same broker (as if there was a temp network partition) and the
> send reply is not received. A non transactional client will resend the
> message which needs to be suppressed by the audit". He also recommends
> disabling audit. However this is for JDBCMessageStore as far as I understand
> and not for KahaDB.
>
> I am sure that this is the reason for my problems as well. I really couldn't
> understand how to disable audit for KahaDB but looking at the configuration
> schema, I applied following changes to the config but none of them fixed the
> problem:
> 1. enableAudit="false" to PolicyEntry for all topics and queues.
> 2. maxProducersToAudit="0" to PolicyEntry for all topics and queues.
>
> Does anyone have any ideas on how to fix this? If not, I will raise a JIRA.
>
> Thanks,
> Ozan
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Message-loss-in-network-of-brokers-transactional-send-tp3588714p3664510.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



--
http://fusesource.com
http://blog.garytully.com
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
Thanks Gary. I can confirm that sending with transactions works perfectly but slower than AUTO for my scenario where messages are generated one by one and need to be sent one by one.

What are the disadvantages of disabling audit in kahadb (apart from the obvious - duplicates won't be suppressed)?

Am I correct in assuming that composite destinations are not generally used when "zero tolerance for message loss" scenarios with audit enabled?

The reason why I was implementing composite destinations (virtual topics in this case) was to make AMQ future proof, i.e. any other consumers implemented in the future that are interested in the same messages won't require reconfiguration & restart of the broker - they will just come in and start listening on a queue of their own backed by the Virtual Topic. An example of this might be a consumer that handles custom statistics or consumers for a separate system that are also interested in the same messages. How do you normally handle such situations?

Will repeat my tests with maxFailoverProducersToTrack as soon as.

Thanks again Gary
Ozan
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
Hi there,

Is it possible to let me know the disadvantages of disabling audit in kahadb (apart from the obvious - duplicates won't be suppressed) please?

Thanks,
Ozan
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

gtully
the only disadvantage is the obvious one, and there will only be a
duplicate dispatch in this case if the message is already acked.

On 11 August 2011 11:18, oseymen <[hidden email]> wrote:

> Hi there,
>
> Is it possible to let me know the disadvantages of disabling audit in kahadb
> (apart from the obvious - duplicates won't be suppressed) please?
>
> Thanks,
> Ozan
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Message-loss-in-network-of-brokers-transactional-send-tp3588714p3735439.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



--
http://fusesource.com
http://blog.garytully.com
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
In reply to this post by gtully
>> "Also, I think a send transaction would help, it should encompass all of the composite destinations."

That is correct. When I send messages in transaction, I get no message loss - but both JMX and admin console pending message count displays wrong number.

My setup is still same - 2 brokers, connected with network of brokers. (Just to remind you:) Messages are flowing from broker1 to broker2. On broker2, I have configured the replicated queue as a composite queue so that I can divert incoming messages into multiple queues in broker2.

When I send 3000 messages to broker1 (in transaction) and start killing brokers one by one randomly while messages are flowing, I end up having 3001 or 3002 pending messages count. When I execute a consumer on broker2, I can consume 3000 messages which is perfect. But admin console reports there is 1 message pending. When I browse the messages in the queue using admin console, it displays no messages. So even though there are no messages pending, somehow, JMX and admin console thinks that there is 1 message pending. When I restart broker2, pending messages count corrects itself and reports 0 in admin console.

Have you ever encountered this problem before? Number of pending messages counter is very important for me for monitoring purposes. If it says 1 message is pending even though its not, I will get monitoring nightmares.

I'd really appreciate any help or information.

Cheers,
Ozan
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
So let me summarize this thread from the beginning:

Please see this arch diagram: ActiveMQ_-_Composite_Queues.png

I am sending 3000 messages from my producer into broker1 and observe message counts in Transit and Indexing queues. I start my producer and while messages are flowing, I start killing brokers one by one randomly.

My first problem was the message loss in the first composite operation: Q.Index.Transit.DC2 & Q.Index.Transit.DC3. This problem is solved by sending Transactional messages from the producer to Q.Index.Replication.

However another problem (which I explained in previous message) came up where admin console and JConsole reporting more messages than there actually are in Q.A.Indexing and Q.B.Indexing queues.

Take a look at this screenshot: ActiveMQ_-_Admin_Console.png

All queues had 3003 messages (instead of 3000). When I consumed all messages in Q.A.Indexing with my consumer, it successfully consumed 3000 messages (as expected) but admin console still reports there are additional 3 messages pending in the queue. When I click on "Browse" for this queue in admin console, it reports that there are no messages. When I restart this broker, pending message count correct itself and reports 0.

Receiving duplicate messages are not a problem for me - I can deal with them in my consumer. But JConsole and admin console reporting that there are still messages to be consumed is a problem for the monitoring side. I have no way of knowing whether there are really 3 messages left and consumer is experiencing problems or consumer is alive but there are no messages to consume.

I'd appreciate any help in this matter. Am I using composite destinations incorrectly? I can also do the same design with Camel which works fine but I'd like to use native AMQ features to do this.

Here are the configuration files for both brokers:
Broker1: activemq.xml
Broker2: activemq.xml

Thanks,
Ozan
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
Broker1 configuration: activemq_-_broker1.xml
Broker2 configuration: activemq_-_broker2.xml
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen
Just created a JIRA for this: https://issues.apache.org/jira/browse/AMQ-3469. This might actually be a bug.
Reply | Threaded
Open this post in threaded view
|

Re: Message loss in network of brokers - transactional send

oseymen