Not abortable slow consumers / stopped processing of messages in a queue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Not abortable slow consumers / stopped processing of messages in a queue

Marek Dominiak
Hi guys,

I am facing a strange problem related to slow consumers that couldn’t been aborted and I couldn’t solve the issue by myself.
We have one queue where we process quite heavy operations (report generation). The number of messages per day isn’t too big: usually less than 100. We have to messageListeners, each on it’s own application node. If the message couldn’t be processed (e.g. due to db transaction timeout) we throw the message into the DLQ (after one redelivery).

Questions: (copied from the bottom so it's easier to see them)
1. Why AbortSlowAckConsumerStrategy couldn't abort the consumer?
2. Is this configuration reasonable for our use-case (few but heavy messages to process)?
3. Is there a better way of making sure that queue processing wouldn't stop? (maybe prefetch set to 0, some DMLC functions which are not good documented, etc.)
4. Could redelivery policy (consumer based or broker based) somehow related to this problem?
5. Is using “consumer caching” in DMLC and PooledConnectionFactory a reasonable choice here? Or would it be better just to use ActiveMQConnectionFactory along with DMLC?
6. Is there something about the whole configuration which tingles yours spider-senses?


First issue:
Firstly, we had problems with stopped processing after some heavy operations were performed in the DMLC. Back then we had the following configuration:
- prefetch size: default 1000
- none of “abortSlow*ConsumerStrategy” was defined
- redelivery policy defined on the jms connection (not on jms factory) - one redelivery

Several times processing on this message has been stopped (usually after attempts to generate some gigantic report) and couldn’t resume by itself without restarting Tomcat nodes. And then I have started to check AbortSlow*ConsumerStrategy.

I have ended up with the following changes:

Second issue:
- Prefetch size: 1
- I have defined  abortSlowAckConsumerStrategy as follows:
<slowConsumerStrategy>
<abortSlowAckConsumerStrategy ignoreIdleConsumers="false" checkPeriod="60000" maxTimeSinceLastAck="600000" abortConnection="false"/>
</slowConsumerStrategy>

- Redelivery policy was defined in the Broker insted of on a connection:
<redeliveryPolicyEntries> 
<redeliveryPolicy queue="generateReportQueue" maximumRedeliveries="1" initialRedeliveryDelay="100000" redeliveryDelay="100000"/>   
</redeliveryPolicyEntries>

With this configuration I hoped to abort a consumer after every 10 minutes (or if after the consumer has acked back), so that the messages wouldn't hang in the queue indefinitely (first issue).
Unfortunately it looked like that the AbortSlowAckConsumerStrategy couldn’t abort the slow consumer.


The logs from AMQ server are here:
….
2014-10-25 00:00:11,455 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for destination:queue://generateReportQueue
...
2014-10-25 01:12:11,641 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for destination:queue://generateReportQueue
...
2014-10-25 18:14:14,557 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for destination:queue://generateReportQueue


After restarting Tomcat nodes the AbortSlowAckConsumerStrategy started to successfully aborting the messages:
2014-10-25 21:18:15,111 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app01.osl.basefarm.net-50652-1414258093487-1:1:40:6 for destination:queue://generateReportQueue
2014-10-25 21:26:15,112 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36743-1414257416757-1:1:50:1 for destination:queue://generateReportQueue
2014-10-25 21:29:15,113 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app01.osl.basefarm.net-50652-1414258093487-1:1:43:1 for destination:queue://generateReportQueue
...

Questions:
1. Why AbortSlowAckConsumerStrategy couldn't abort the consumer?
2. Is this configuration reasonable for our use-case (few but heavy messages to process)?
3. Is there a better way of making sure that queue processing wouldn't stop? (maybe prefetch set to 0, some DMLC functions which are not good documented, etc.)
4. Could redelivery policy (consumer based or broker based) somehow related to this problem?
5. Is using “consumer caching” in DMLC and PooledConnectionFactory a reasonable choice here? Or would it be better just to use ActiveMQConnectionFactory along with DMLC?
6. Is there something about the whole configuration which tingles yours spider-senses?


The context:


Environment:
- We have a clustered Spring based application (running on 2 Tomcat nodes)
- We have ActiveMQ standalone working on two nodes (for failover)
- We run the application on Linux boxes (Red Hat Enterprise)
- Versions: Spring 4.1.0.RC1, ActiveMQ 5.9.1, Java 7
-Memory:
- ActiveMQ on each node: -Xmx2048m -Xms512m -XX:MaxPermSize=128m
- Tomcat on each node: 12Gb of memory

Configuration of JMS:
AMQ factory and jmsTemplate on client side:

  <bean id="connectionFactory" class="org.apache.activemq.pool.PooledConnectionFactory"
          destroy-method="stop">
        <property name="connectionFactory">
            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
                <property name="brokerURL">
                    <value>${activemq.broker.url}</value>
                </property>
                <property name="nonBlockingRedelivery">
                    <value>false</value>
                </property>
            </bean>
        </property>
</bean>

 <bean id="jmsTemplate" class="org.springframework.jms.core.JmsTemplate">
        <property name="connectionFactory" ref="connectionFactory"/>
        <property name="receiveTimeout" value="2000"/>
        <property name="sessionTransacted" value="true"/>
</bean>

Spring DMLC

<bean id="defaultMessageListenerContainerParent" abstract="true"
    class="org.springframework.jms.listener.DefaultMessageListenerContainer">
        <property name="connectionFactory" ref="connectionFactory"/>
        <property name="sessionTransacted" value="true"/>
        <property name="cacheLevelName" value="CACHE_CONSUMER"/>
</bean>

 <bean parent="defaultMessageListenerContainerParent">
        <property name="destination" ref="generateReportQueue"/>
        <property name="messageListener" ref="sendReportToMailMessageListener"/>
  </bean>

ActiveMQ broker:

<broker xmlns="http://activemq.apache.org/schema/core" brokerName="localhost" dataDirectory="${activemq.data}" useJmx="true" schedulerSupport="true">     
        <destinationPolicy>
            <policyMap>
                <policyEntries>
                    <policyEntry topic=">" producerFlowControl="false" memoryLimit="64mb">                    
                        <pendingMessageLimitStrategy>
                            <constantPendingMessageLimitStrategy limit="1000"/>
                        </pendingMessageLimitStrategy>
                    </policyEntry>
                    <policyEntry queue=">" producerFlowControl="false" memoryLimit="64mb">                      
                        <deadLetterStrategy>                         
                            <individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true"/>
                        </deadLetterStrategy>
                    </policyEntry>                 
                    <policyEntry queue="generateReportQueue" producerFlowControl="false">
                        <deadLetterStrategy>
                            <individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true"/>
                        </deadLetterStrategy>
                        <slowConsumerStrategy>
                            <abortSlowAckConsumerStrategy ignoreIdleConsumers="false" checkPeriod="60000"
                                                          maxTimeSinceLastAck="600000" abortConnection="false"/>
                        </slowConsumerStrategy>
                    </policyEntry>
                </policyEntries>
            </policyMap>
        </destinationPolicy>

        <managementContext>
            <managementContext connectorPort="1099"/>
        </managementContext>

        <persistenceAdapter>
          <kahaDB directory="XXXX"/>
        </persistenceAdapter>

        <systemUsage>
            <systemUsage>
                <memoryUsage>
                    <memoryUsage percentOfJvmHeap="70"/>
                </memoryUsage>
                <storeUsage>
                    <storeUsage limit="5 gb"/>
                </storeUsage>
                <tempUsage>
                    <tempUsage limit="2 gb"/>
                </tempUsage>
            </systemUsage>
        </systemUsage>

        <transportConnectors>
            <transportConnector name="openwire"
                                uri="tcp://0.0.0.0:61616?maximumConnections=1000&amp;wireformat.maxFrameSize=104857600"/>
            <transportConnector name="amqp"
                                uri="amqp://0.0.0.0:5672?maximumConnections=1000&amp;wireformat.maxFrameSize=104857600"/>
            <transportConnector name="stomp"
                                uri="stomp://0.0.0.0:61613?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
            <transportConnector name="mqtt"
                                uri="mqtt://0.0.0.0:1883?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
            <transportConnector name="ws"
                                uri="ws://0.0.0.0:61614?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
        </transportConnectors>

        <shutdownHooks>
            <bean xmlns="http://www.springframework.org/schema/beans"
                  class="org.apache.activemq.hooks.SpringContextHook"/>
        </shutdownHooks>

        <plugins>
            <redeliveryPlugin fallbackToDeadLetter="true" sendToDlqIfMaxRetriesExceeded="true">
                <redeliveryPolicyMap>
                    <redeliveryPolicyMap>
                        <redeliveryPolicyEntries>
                            <redeliveryPolicy queue="generateReportQueue" maximumRedeliveries="1"
                                              initialRedeliveryDelay="100000" redeliveryDelay="100000"/>
                        </redeliveryPolicyEntries>
                    </redeliveryPolicyMap>
                </redeliveryPolicyMap>
            </redeliveryPlugin>
        </plugins>

    </broker>


Any ideas / help is appreciated.
Thank you in the advance.

Regards
Marek Dominiak
Reply | Threaded
Open this post in threaded view
|

Re: Not abortable slow consumers / stopped processing of messages in a queue

Tim Bain
1.  In my limited experience with slow consumer abort strategies (using the
SlowConsumerAbortStrategy, not the SlowAckConsumerAbortStrategy), I've
observed that a client will continue processing the current message even
when aborted; the abort seems to allow the broker to get on with its life
but doesn't seem to stop the client from finishing what it's doing.  If
that's what you mean by "AbortSlowAckConsumerStrategy couldn't abort the
consumer", then that's in line with what I've observed.  Maybe someone who
knows the ActiveMQ client code more intimately will know of a way to
interrupt the processing that the client is doing, but if not, you might
need to build a max processing time into your client's message-handling
logic, to allow your client to stop if it takes too long.
2.  Your config seems reasonable for your use-case, though slow consumer
abort strategies are generally intended for when a consumer unexpectedly
takes a long time, whereas your use case seems like your consumers
expectedly but unpredictably take a long time.  But certainly you're using
the more appropriate of the two strategies if you're going to use one.
3.  How does queue processing "stop"?  Do you just mean that once both
consumers start working on large messages, they're not available to work on
small messages?
4.  I'm concerned that by allowing one redelivery of each message, you're
setting up a situation where you could tie up both of your consumers (one
processing the first delivery, one processing the second for the same
message); is message re-delivery something you have to have?
5.  I'm the wrong person to answer this question, but hopefully someone
else will.
6.  One thing you might consider is having your client spin off the work of
processing a message into a separate thread, and then returning
(successfully) after either the thread finishes or some timeout elapses,
whichever happens first.  Then when a large message comes in, it will run
in the background till it finishes, but it won't prevent the consumer from
continuing on without it and it won't cause the broker to redeliver the
message to the other consumer and tie up processing.  Obviously your
processing algorithm will need to be thread-safe for this to work, but it
might give you options without even needing to worry about the
SlowConsumerAbortStrategy...  Also, if you've got an algorithm that usually
takes under 10 minutes and sometimes takes 18 hours (based on your logs
from before you restarted Tomcat), you might want to improve your
algorithm, to either speed up the work you're currently doing or find a way
to get your answer with less processing (e.g. by only sampling some of your
data).  This is obviously very specific to whatever domain you're working
in and might not be easy to do, but 18 hours to process a message
definitely makes my Spidey senses tingle...

Tim

On Sun, Oct 26, 2014 at 8:28 AM, Marek Dominiak <[hidden email]> wrote:

> Hi guys,
>
> I am facing a strange problem related to slow consumers that couldn’t been
> aborted and I couldn’t solve the issue by myself.
> We have one queue where we process quite heavy operations (report
> generation). The number of messages per day isn’t too big: usually less
> than
> 100. We have to messageListeners, each on it’s own application node. If the
> message couldn’t be processed (e.g. due to db transaction timeout) we throw
> the message into the DLQ (after one redelivery).
>
> *Questions: (copied from the bottom so it's easier to see them)*
> 1. Why AbortSlowAckConsumerStrategy couldn't abort the consumer?
> 2. Is this configuration reasonable for our use-case (few but heavy
> messages
> to process)?
> 3. Is there a better way of making sure that queue processing wouldn't
> stop?
> (maybe prefetch set to 0, some DMLC functions which are not good
> documented,
> etc.)
> 4. Could redelivery policy (consumer based or broker based) somehow related
> to this problem?
> 5. Is using “consumer caching” in DMLC and PooledConnectionFactory a
> reasonable choice here? Or would it be better just to use
> ActiveMQConnectionFactory along with DMLC?
> 6. Is there something about the whole configuration which tingles yours
> spider-senses?
>
>
> *First issue:*
> Firstly, we had problems with stopped processing after some heavy
> operations
> were performed in the DMLC. Back then we had the following configuration:
> - prefetch size: default 1000
> - none of “abortSlow*ConsumerStrategy” was defined
> - redelivery policy defined on the jms connection (not on jms factory) -
> one
> redelivery
>
> Several times processing on this message has been stopped (usually after
> attempts to generate some gigantic report) and couldn’t resume by itself
> without restarting Tomcat nodes. And then I have started to check
> AbortSlow*ConsumerStrategy.
>
> I have ended up with the following changes:
>
> *Second issue:*
> - Prefetch size: 1
> - I have defined  abortSlowAckConsumerStrategy as follows:
> <slowConsumerStrategy>
> <abortSlowAckConsumerStrategy ignoreIdleConsumers="false"
> checkPeriod="60000" maxTimeSinceLastAck="600000" abortConnection="false"/>
> </slowConsumerStrategy>
>
> - Redelivery policy was defined in the Broker insted of on a connection:
> <redeliveryPolicyEntries>
> <redeliveryPolicy queue="generateReportQueue" maximumRedeliveries="1"
> initialRedeliveryDelay="100000" redeliveryDelay="100000"/>
> </redeliveryPolicyEntries>
>
> With this configuration I hoped to abort a consumer after every 10 minutes
> (or if after the consumer has acked back), so that the messages wouldn't
> hang in the queue indefinitely (first issue).
> Unfortunately it looked like that the AbortSlowAckConsumerStrategy couldn’t
> abort the slow consumer.
>
>
> The logs from AMQ server are here:
> ….
> 2014-10-25 00:00:11,455 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for
> destination:queue://generateReportQueue
> ...
> 2014-10-25 01:12:11,641 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for
> destination:queue://generateReportQueue
> ...
> 2014-10-25 18:14:14,557 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for
> destination:queue://generateReportQueue
> …
>
> After restarting Tomcat nodes the AbortSlowAckConsumerStrategy started to
> successfully aborting the messages:
> 2014-10-25 21:18:15,111 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app01.osl.basefarm.net-50652-1414258093487-1:1:40:6 for
> destination:queue://generateReportQueue
> 2014-10-25 21:26:15,112 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app02.osl.basefarm.net-36743-1414257416757-1:1:50:1 for
> destination:queue://generateReportQueue
> 2014-10-25 21:29:15,113 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app01.osl.basefarm.net-50652-1414258093487-1:1:43:1 for
> destination:queue://generateReportQueue
> ...
>
> *Questions:*
> 1. Why AbortSlowAckConsumerStrategy couldn't abort the consumer?
> 2. Is this configuration reasonable for our use-case (few but heavy
> messages
> to process)?
> 3. Is there a better way of making sure that queue processing wouldn't
> stop?
> (maybe prefetch set to 0, some DMLC functions which are not good
> documented,
> etc.)
> 4. Could redelivery policy (consumer based or broker based) somehow related
> to this problem?
> 5. Is using “consumer caching” in DMLC and PooledConnectionFactory a
> reasonable choice here? Or would it be better just to use
> ActiveMQConnectionFactory along with DMLC?
> 6. Is there something about the whole configuration which tingles yours
> spider-senses?
>
> *
> The context:*
>
> *Environment:*
> - We have a clustered Spring based application (running on 2 Tomcat nodes)
> - We have ActiveMQ standalone working on two nodes (for failover)
> - We run the application on Linux boxes (Red Hat Enterprise)
> - Versions: Spring 4.1.0.RC1, ActiveMQ 5.9.1, Java 7
> -Memory:
> - ActiveMQ on each node: -Xmx2048m -Xms512m -XX:MaxPermSize=128m
> - Tomcat on each node: 12Gb of memory
>
> *Configuration of JMS: *
> *AMQ factory and jmsTemplate on client side:*
>
>   <bean id="connectionFactory"
> class="org.apache.activemq.pool.PooledConnectionFactory"
>           destroy-method="stop">
>         <property name="connectionFactory">
>             <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>                 <property name="brokerURL">
>                     <value>${activemq.broker.url}</value>
>                 </property>
>                 <property name="nonBlockingRedelivery">
>                     <value>false</value>
>                 </property>
>             </bean>
>         </property>
> </bean>
>
>  <bean id="jmsTemplate" class="org.springframework.jms.core.JmsTemplate">
>         <property name="connectionFactory" ref="connectionFactory"/>
>         <property name="receiveTimeout" value="2000"/>
>         <property name="sessionTransacted" value="true"/>
> </bean>
>
> *Spring DMLC*
>
> <bean id="defaultMessageListenerContainerParent" abstract="true"
>
> class="org.springframework.jms.listener.DefaultMessageListenerContainer">
>         <property name="connectionFactory" ref="connectionFactory"/>
>         <property name="sessionTransacted" value="true"/>
>         <property name="cacheLevelName" value="CACHE_CONSUMER"/>
> </bean>
>
>  <bean parent="defaultMessageListenerContainerParent">
>         <property name="destination" ref="generateReportQueue"/>
>         <property name="messageListener"
> ref="sendReportToMailMessageListener"/>
>   </bean>
>
> *ActiveMQ broker:*
>
> <broker xmlns="http://activemq.apache.org/schema/core"
> brokerName="localhost" dataDirectory="${activemq.data}" useJmx="true"
> schedulerSupport="true">
>         <destinationPolicy>
>             <policyMap>
>                 <policyEntries>
>                     <policyEntry topic=">" producerFlowControl="false"
> memoryLimit="64mb">
>                         <pendingMessageLimitStrategy>
>                             <constantPendingMessageLimitStrategy
> limit="1000"/>
>                         </pendingMessageLimitStrategy>
>                     </policyEntry>
>                     <policyEntry queue=">" producerFlowControl="false"
> memoryLimit="64mb">
>                         <deadLetterStrategy>
>                             <individualDeadLetterStrategy
> queuePrefix="DLQ."
> useQueueForQueueMessages="true"/>
>                         </deadLetterStrategy>
>                     </policyEntry>
>                     <policyEntry queue="generateReportQueue"
> producerFlowControl="false">
>                         <deadLetterStrategy>
>                             <individualDeadLetterStrategy
> queuePrefix="DLQ."
> useQueueForQueueMessages="true"/>
>                         </deadLetterStrategy>
>                         <slowConsumerStrategy>
>                             <abortSlowAckConsumerStrategy
> ignoreIdleConsumers="false" checkPeriod="60000"
>
> maxTimeSinceLastAck="600000" abortConnection="false"/>
>                         </slowConsumerStrategy>
>                     </policyEntry>
>                 </policyEntries>
>             </policyMap>
>         </destinationPolicy>
>
>         <managementContext>
>             <managementContext connectorPort="1099"/>
>         </managementContext>
>
>         <persistenceAdapter>
>           <kahaDB directory="XXXX"/>
>         </persistenceAdapter>
>
>         <systemUsage>
>             <systemUsage>
>                 <memoryUsage>
>                     <memoryUsage percentOfJvmHeap="70"/>
>                 </memoryUsage>
>                 <storeUsage>
>                     <storeUsage limit="5 gb"/>
>                 </storeUsage>
>                 <tempUsage>
>                     <tempUsage limit="2 gb"/>
>                 </tempUsage>
>             </systemUsage>
>         </systemUsage>
>
>         <transportConnectors>
>             <transportConnector name="openwire"
>
> uri="tcp://
> 0.0.0.0:61616?maximumConnections=1000&amp;wireformat.maxFrameSize=104857600
> "/>
>             <transportConnector name="amqp"
>
> uri="amqp://
> 0.0.0.0:5672?maximumConnections=1000&amp;wireformat.maxFrameSize=104857600
> "/>
>             <transportConnector name="stomp"
>
> uri="stomp://
> 0.0.0.0:61613?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600
> "/>
>             <transportConnector name="mqtt"
>
> uri="mqtt://
> 0.0.0.0:1883?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600
> "/>
>             <transportConnector name="ws"
>
> uri="ws://
> 0.0.0.0:61614?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600
> "/>
>         </transportConnectors>
>
>         <shutdownHooks>
>             <bean xmlns="http://www.springframework.org/schema/beans"
>                   class="org.apache.activemq.hooks.SpringContextHook"/>
>         </shutdownHooks>
>
>         <plugins>
>             <redeliveryPlugin fallbackToDeadLetter="true"
> sendToDlqIfMaxRetriesExceeded="true">
>                 <redeliveryPolicyMap>
>                     <redeliveryPolicyMap>
>                         <redeliveryPolicyEntries>
>                             <redeliveryPolicy queue="generateReportQueue"
> maximumRedeliveries="1"
>
> initialRedeliveryDelay="100000" redeliveryDelay="100000"/>
>                         </redeliveryPolicyEntries>
>                     </redeliveryPolicyMap>
>                 </redeliveryPolicyMap>
>             </redeliveryPlugin>
>         </plugins>
>
>     </broker>
>
>
> Any ideas / help is appreciated.
> Thank you in the advance.
>
> Regards
> Marek Dominiak
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Not-abortable-slow-consumers-stopped-processing-of-messages-in-a-queue-tp4686721.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Not abortable slow consumers / stopped processing of messages in a queue

Marek Dominiak
Hi Tim,

Thank you for your input and sharing your experience and knowledge.

tbain98 wrote
1.  In my limited experience with slow consumer abort strategies (using the
SlowConsumerAbortStrategy, not the SlowAckConsumerAbortStrategy), I've
observed that a client will continue processing the current message even
when aborted; the abort seems to allow the broker to get on with its life
but doesn't seem to stop the client from finishing what it's doing.  If
that's what you mean by "AbortSlowAckConsumerStrategy couldn't abort the
consumer", then that's in line with what I've observed.  Maybe someone who
knows the ActiveMQ client code more intimately will know of a way to
interrupt the processing that the client is doing, but if not, you might
need to build a max processing time into your client's message-handling
logic, to allow your client to stop if it takes too long.

First I need to clarify some things:
- We have defined a transactionTimeout (30 minutes) on a database, so if a listener can’t consume the message in less than 30 minutes the exception will be thrown and listener will run redelivery policy rules. (In this case it will be send back to the broker and broker will schedule one more redelivery after 100 seconds, if the message couldn’t be processed again broker will send it to the DLQ). The whole processing can take about 30-35 minutes max.

- About the logs: I have posted the logs somehow selectively - I wanted to show that the same idle consumer can’t be aborted in a span of 18 hours. Lines like this one:


2014-10-25 00:00:11,455 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for destination:queue://generateReportQueue

we get every 6 minutes (or so) with the same consumer id all the time. I have cut the most of logs to keep it short. There were no messages to consume at that point in the queue. The real issue is that SlowAckConsumerAbortStrategy couldn’t abort the consumer which was idle. My experience with the abort strategy (when it’s working correctly) is the same as yours: it doesn’t abort the consumer but politely asks to abort when it finished processing current message. But in this case the consumer didn’t have anything to process (maybe it only had some acks to send back - as I was using JMX to move messages to a DLQ by mylself).



tbain98 wrote
2.  Your config seems reasonable for your use-case, though slow consumer
abort strategies are generally intended for when a consumer unexpectedly
takes a long time, whereas your use case seems like your consumers
expectedly but unpredictably take a long time.  But certainly you're using
the more appropriate of the two strategies if you're going to use one.
As I mentioned earlier max processing time of one message is about 30 minutes. I agree with you that with so few messages in the queue it’s probably better to use check acks and not rely on prefetch buffer.

tbain98 wrote
3.  How does queue processing "stop"?  Do you just mean that once both
consumers start working on large messages, they're not available to work on
small messages?
I mean when consumer “stops processing”, none of messages in the queue are being consumed at all (both: small and bigger ones) - they stay indefinitely in the queue (until the whole application is restarted). It happens as well for consumers on both nodes (1 consumer per node).

tbain98 wrote
4.  I'm concerned that by allowing one redelivery of each message, you're
setting up a situation where you could tie up both of your consumers (one
processing the first delivery, one processing the second for the same
message); is message re-delivery something you have to have?
That could be the case. I can try to verify if in this case I really need redelivery, but from what I remember in two attempts the reports are generated in most of the cases, with only one attempt the percentage is smaller, which requires more manual attention ...


tbain98 wrote
6.  One thing you might consider is having your client spin off the work of
processing a message into a separate thread, and then returning
(successfully) after either the thread finishes or some timeout elapses,
whichever happens first.  Then when a large message comes in, it will run
in the background till it finishes, but it won't prevent the consumer from
continuing on without it and it won't cause the broker to redeliver the
message to the other consumer and tie up processing.  Obviously your
processing algorithm will need to be thread-safe for this to work, but it
might give you options without even needing to worry about the
SlowConsumerAbortStrategy...  Also, if you've got an algorithm that usually
takes under 10 minutes and sometimes takes 18 hours (based on your logs
from before you restarted Tomcat), you might want to improve your
algorithm, to either speed up the work you're currently doing or find a way
to get your answer with less processing (e.g. by only sampling some of your
data).  This is obviously very specific to whatever domain you're working
in and might not be easy to do, but 18 hours to process a message
definitely makes my Spidey senses tingle...
If I understand you correctly I think we can’t use this approach (did I?). The whole point of employing JMS for us was to have async processing with guarantees. In our system we could have many bugfix releases throughout the day, and if that would happen and the report wasn’t generated before the restart of the application we would lose the message. I am trying to find a config which will work for us most often automatically and only for certain problems require manual developer attention.

Once again, thank you for input.

Regards
Marek
Reply | Threaded
Open this post in threaded view
|

Re: Not abortable slow consumers / stopped processing of messages in a queue

Tim Bain
I'm not clear on what behavior you're seeing, because the descriptions you
give (as I understand them) seem contradictory.  You say that the consumer
won't abort, but that you've got a 30-minute client-side abort timeout.
You say that after the intended abort, you know it didn't work because the
consumer didn't resume processing messages, but then you say that there
weren't any messages to process.  Maybe you're describing multiple
independent scenarios with different behavior and I'm just not catching the
difference between them, but I'm not at all clear on what you're seeing.
Can you give us a from-the-top summary?  No need to give the overview or
any config files or log files, just tell us at each step what you expect to
happen and what's actually happening (and how you know).

Also, your first message was all about aborting slow consumers, while your
reply sounds like it's concerned entirely with aborting idle consumers.
Which one's the problem here?  Also, how do you know that a particular idle
consumer isn't being aborted?  The logs tell you the abort is happening;
what's telling you it's not?

1.  If I've understood correctly, you say your business logic will abort
after 30 minutes, independently of any ActiveMQ-initiated abort request.
Is that actually happening?  The logs you've posted don't give any
indication either way (and you say "the same idle consumer can’t be aborted
in a span of 18 hours"), and the behavior you're describing would be more
consistent with your clients not aborting than with them aborting but not
pulling the next message, though of course both are possible.  So make sure
your client's really doing what you think it is.

3.  Can you confirm (via JConsole in the MBeans tab or some other JMX
viewer) that your consumer is still connected to the broker after the
abort?  Also, when your client aborts, how is ActiveMQ being told about the
failure?  (And what ack mode are you using?)

5 & 6.  For you to use the approach I suggested, you'd either have to be OK
losing messages when failures occur or you'd have to persist the message to
a datastore to retry in the case of a failure.  It sounds like neither of
those is appealing, so this may not be an option.



On Tue, Oct 28, 2014 at 3:42 AM, Marek Dominiak <[hidden email]> wrote:

> Hi Tim,
>
> Thank you for your input and sharing your experience and knowledge.
>
>
> tbain98 wrote
> > 1.  In my limited experience with slow consumer abort strategies (using
> > the
> > SlowConsumerAbortStrategy, not the SlowAckConsumerAbortStrategy), I've
> > observed that a client will continue processing the current message even
> > when aborted; the abort seems to allow the broker to get on with its life
> > but doesn't seem to stop the client from finishing what it's doing.  If
> > that's what you mean by "AbortSlowAckConsumerStrategy couldn't abort the
> > consumer", then that's in line with what I've observed.  Maybe someone
> who
> > knows the ActiveMQ client code more intimately will know of a way to
> > interrupt the processing that the client is doing, but if not, you might
> > need to build a max processing time into your client's message-handling
> > logic, to allow your client to stop if it takes too long.
>
>
> First I need to clarify some things:
> - We have defined a transactionTimeout (30 minutes) on a database, so if a
> listener can’t consume the message in less than 30 minutes the exception
> will be thrown and listener will run redelivery policy rules. (In this case
> it will be send back to the broker and broker will schedule one more
> redelivery after 100 seconds, if the message couldn’t be processed again
> broker will send it to the DLQ). The whole processing can take about 30-35
> minutes max.
>
> - About the logs: I have posted the logs somehow selectively - I wanted to
> show that the same idle consumer can’t be aborted in a span of 18 hours.
> Lines like this one:
>
>
> 2014-10-25 00:00:11,455 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for
> destination:queue://generateReportQueue
>
> we get every 6 minutes (or so) with the same consumer id all the time. I
> have cut the most of logs to keep it short. There were no messages to
> consume at that point in the queue. The real issue is that
> SlowAckConsumerAbortStrategy couldn’t abort the consumer which was idle. My
> experience with the abort strategy (when it’s working correctly) is the
> same
> as yours: it doesn’t abort the consumer but politely asks to abort when it
> finished processing current message. But in this case the consumer didn’t
> have anything to process (maybe it only had some acks to send back - as I
> was using JMX to move messages to a DLQ by mylself).
>
>
>
>
> tbain98 wrote
> > 2.  Your config seems reasonable for your use-case, though slow consumer
> > abort strategies are generally intended for when a consumer unexpectedly
> > takes a long time, whereas your use case seems like your consumers
> > expectedly but unpredictably take a long time.  But certainly you're
> using
> > the more appropriate of the two strategies if you're going to use one.
>
> As I mentioned earlier max processing time of one message is about 30
> minutes. I agree with you that with so few messages in the queue it’s
> probably better to use check acks and not rely on prefetch buffer.
>
>
> tbain98 wrote
> > 3.  How does queue processing "stop"?  Do you just mean that once both
> > consumers start working on large messages, they're not available to work
> > on
> > small messages?
>
> I mean when consumer “stops processing”, none of messages in the queue are
> being consumed at all (both: small and bigger ones) - they stay
> indefinitely
> in the queue (until the whole application is restarted). It happens as well
> for consumers on both nodes (1 consumer per node).
>
>
> tbain98 wrote
> > 4.  I'm concerned that by allowing one redelivery of each message, you're
> > setting up a situation where you could tie up both of your consumers (one
> > processing the first delivery, one processing the second for the same
> > message); is message re-delivery something you have to have?
>
> That could be the case. I can try to verify if in this case I really need
> redelivery, but from what I remember in two attempts the reports are
> generated in most of the cases, with only one attempt the percentage is
> smaller, which requires more manual attention ...
>
>
>
> tbain98 wrote
> > 6.  One thing you might consider is having your client spin off the work
> > of
> > processing a message into a separate thread, and then returning
> > (successfully) after either the thread finishes or some timeout elapses,
> > whichever happens first.  Then when a large message comes in, it will run
> > in the background till it finishes, but it won't prevent the consumer
> from
> > continuing on without it and it won't cause the broker to redeliver the
> > message to the other consumer and tie up processing.  Obviously your
> > processing algorithm will need to be thread-safe for this to work, but it
> > might give you options without even needing to worry about the
> > SlowConsumerAbortStrategy...  Also, if you've got an algorithm that
> > usually
> > takes under 10 minutes and sometimes takes 18 hours (based on your logs
> > from before you restarted Tomcat), you might want to improve your
> > algorithm, to either speed up the work you're currently doing or find a
> > way
> > to get your answer with less processing (e.g. by only sampling some of
> > your
> > data).  This is obviously very specific to whatever domain you're working
> > in and might not be easy to do, but 18 hours to process a message
> > definitely makes my Spidey senses tingle...
>
> If I understand you correctly I think we can’t use this approach (did I?).
> The whole point of employing JMS for us was to have async processing with
> guarantees. In our system we could have many bugfix releases throughout the
> day, and if that would happen and the report wasn’t generated before the
> restart of the application we would lose the message. I am trying to find a
> config which will work for us most often automatically and only for certain
> problems require manual developer attention.
>
> Once again, thank you for input.
>
> Regards
> Marek
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Not-abortable-slow-consumers-stopped-processing-of-messages-in-a-queue-tp4686721p4686741.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Not abortable slow consumers / stopped processing of messages in a queue

Marek Dominiak
In reply to this post by Tim Bain
tbain98 wrote
I'm not clear on what behavior you're seeing, because the descriptions you
give (as I understand them) seem contradictory.  You say that the consumer
won't abort, but that you've got a 30-minute client-side abort timeout.
You say that after the intended abort, you know it didn't work because the
consumer didn't resume processing messages, but then you say that there
weren't any messages to process.  Maybe you're describing multiple
independent scenarios with different behavior and I'm just not catching the
difference between them, but I'm not at all clear on what you're seeing.
Can you give us a from-the-top summary?  No need to give the overview or
any config files or log files, just tell us at each step what you expect to
happen and what's actually happening (and how you know).

Also, your first message was all about aborting slow consumers, while your
reply sounds like it's concerned entirely with aborting idle consumers.
Which one's the problem here?  Also, how do you know that a particular idle
consumer isn't being aborted?  The logs tell you the abort is happening;
what's telling you it's not?
Hi Tim,

I am sorry for my unclear description. I have mixed two (or more) issues in one post.

First issue
 was visible with this config (30 minutes timeout on MessageListener due to tx timeout, prefetch size: default 1000, none of “abortSlow*ConsumerStrategy” defined, redelivery policy defined on the jms connection <not on jms factory> - one redelivery). It looks like the JMS processing has stopped at one point: consumer got a HEAVY message, it failed at least once to consume it, message was still visible via ActiveMQ Web App (or JMX), consumer had 1 message to be acked (I saw this via JMX). After one day this very consumer hasn’t done anything at all - it was idle, it didn’t get any more messages dispatched to it while the other consumer got a lot of them. I was forced to move the message to DLQ and to restart the application node.

Similar behaviour I have got with prefetch.size set to 0. That is when I started to look at abort strategies, AbortSlowAckedConsumerStrategy looked like the one to use. And after I have configured it I got the other issue with a new configuration.

Second issue: conf (30 minutes timeout on MessageListener due to tx timeout, AbortSlowAckConsumerStrategy set to abort every consumer after every 6 minutes abortConnection=false), I had 1 HEAVY message in the queue, and it failed due to tx timeout. Now as I understand the AbortSlowAckConsumerStrategy should abort the consumer and create a new one which should try to consume the message after some time (100 seconds due to redelivery policy), but what happened that day was that consumer stayed alive with 1 message to be acked back. And the consumer was being kept alive indefinitely (the same consumer id + logs I have posted). I was forced to move this message to the DLQ, but the consumer was still slow, idle with no. of messages to be acked = 1. After some time I was forced to restart application node to get new consumer in place.

As I understand aborting correctly, after consumer has been marked as slow and it did finish it’s job (even if it was unsuccessful: rollback) it should be removed and replaced by a different one. (This behavior is observable for “smaller” messages).

So in the logs I should see a different consumer id after some time, but in this case it didn’t happen (without restarting server).

2014-10-25 00:00:11,455 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for destination:queue://generateReportQueue
… more logs every 6 minutes here
2014-10-25 01:12:11,641 [host] Scheduler] INFO  AbortSlowConsumerStrategy      - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for destination:queue://generateReportQueue

Both issues looks kind of the same for me now: consumer hasn’t acked back to the server after it failed to consume this HEAVY message (one or more times) and it stopped to be responsive (no acked sent back, abort strategy couldn’t force it to stop, no new messages dispatched to the consumer).


tbain98 wrote
1.  If I've understood correctly, you say your business logic will abort
after 30 minutes, independently of any ActiveMQ-initiated abort request.
Is that actually happening?  The logs you've posted don't give any
indication either way (and you say "the same idle consumer can’t be aborted
in a span of 18 hours"), and the behavior you're describing would be more
consistent with your clients not aborting than with them aborting but not
pulling the next message, though of course both are possible.  So make sure
your client's really doing what you think it is.
Yes, that is really happening, in the application logs I can see that the transaction is stopped, and then code which listens to the exceptions in all MessageListeners sends information about the exception to the devs (custom error handler in DefaultMessageListenerContainer). Usually just after getting this exception consumer starts to consume the next message or waits until redelivery policy kicks in to try to consume the message again.

As you mentioned it, I will try to verify what is actually happening under the hood in the DMLC (we have done some small extensions to it in order to define redelivery policy per connection, and custom error handling). Maybe when processing is too heavy the rollback isn’t called?

tbain98 wrote
3.  Can you confirm (via JConsole in the MBeans tab or some other JMX
viewer) that your consumer is still connected to the broker after the
abort?  Also, when your client aborts, how is ActiveMQ being told about the
failure?  (And what ack mode are you using?)
Via JMX I saw that consumer was still connected to the server
We use session transacted (without transaction manager but with some Spring JMS magic to handle commit/rollback correctly).

“Also, when your client aborts, how is ActiveMQ being told about the
failure?”
I am not sure if I understood your question.

tbain98 wrote
5 & 6.  For you to use the approach I suggested, you'd either have to be OK
losing messages when failures occur or you'd have to persist the message to
a datastore to retry in the case of a failure.  It sounds like neither of
those is appealing, so this may not be an option.
We use persistent store in ActiveMQ, and I hoped this should be enough. I am trying to track down the bug in the configuration, to avoid unnecessary over complicated store code/configuration.

Thanks again for your help Tim.


I will try to see what’s under the hood in the Spring DMLC when heavy message has failed due to transaction timeout and if upgrading to 5.10.0 solves our issues. I will write when I am done with these two.


Regards
Marek Dominiak
Reply | Threaded
Open this post in threaded view
|

Re: Not abortable slow consumers / stopped processing of messages in a queue

Tim Bain
Marek,

Thanks for the explanation; that helped me to separate the different things
you're seeing.

On Fri, Oct 31, 2014 at 5:21 AM, Marek Dominiak <[hidden email]> wrote:

> tbain98 wrote
> > I'm not clear on what behavior you're seeing, because the descriptions
> you
> > give (as I understand them) seem contradictory.  You say that the
> consumer
> > won't abort, but that you've got a 30-minute client-side abort timeout.
> > You say that after the intended abort, you know it didn't work because
> the
> > consumer didn't resume processing messages, but then you say that there
> > weren't any messages to process.  Maybe you're describing multiple
> > independent scenarios with different behavior and I'm just not catching
> > the
> > difference between them, but I'm not at all clear on what you're seeing.
> > Can you give us a from-the-top summary?  No need to give the overview or
> > any config files or log files, just tell us at each step what you expect
> > to
> > happen and what's actually happening (and how you know).
> >
> > Also, your first message was all about aborting slow consumers, while
> your
> > reply sounds like it's concerned entirely with aborting idle consumers.
> > Which one's the problem here?  Also, how do you know that a particular
> > idle
> > consumer isn't being aborted?  The logs tell you the abort is happening;
> > what's telling you it's not?
>
> Hi Tim,
>
> I am sorry for my unclear description. I have mixed two (or more) issues in
> one post.
> *
> First issue* was visible with this config (30 minutes timeout on
> MessageListener due to tx timeout, prefetch size: default 1000, none of
> “abortSlow*ConsumerStrategy” defined, redelivery policy defined on the jms
> connection <not on jms factory> - one redelivery). It looks like the JMS
> processing has stopped at one point: consumer got a HEAVY message, it
> failed
> at least once to consume it, message was still visible via ActiveMQ Web App
> (or JMX), consumer had 1 message to be acked (I saw this via JMX). After
> one
> day this very consumer hasn’t done anything at all - it was idle, it didn’t
> get any more messages dispatched to it while the other consumer got a lot
> of
> them. I was forced to move the message to DLQ and to restart the
> application
> node.
>

It sounds like something's going wrong with rolling back your transaction
here, because a client-aborted transaction isn't supposed to prevent the
client from processing the next message.  Since you say you've extended
DMLC, I'd start looking there for something that's not handled properly,
and if that doesn't turn up anything I'd attach a debugger and step through
the code to see if you can see what's going wrong.  I've never used
transactions so I can't offer any specific advice here, but since you've
got a reproducible test case it should be fairly straightforward to step
through the code and see if something's going wrong.


> Similar behaviour I have got with prefetch.size set to 0. That is when I
> started to look at abort strategies, AbortSlowAckedConsumerStrategy looked
> like the one to use. And after I have configured it I got the other issue
> with a new configuration.
>

I thought the minimum prefetch size was 1, based on what's in the "Using
the Prefetch Policy to configure the limit" section of
http://activemq.apache.org/slow-consumer-handling.html...


> *Second issue:* conf (30 minutes timeout on MessageListener due to tx
> timeout, AbortSlowAckConsumerStrategy set to abort every consumer after
> every 6 minutes abortConnection=false), I had 1 HEAVY message in the queue,
> and it failed due to tx timeout. Now as I understand the
> AbortSlowAckConsumerStrategy should abort the consumer and create a new one
> which should try to consume the message after some time (100 seconds due to
> redelivery policy), but what happened that day was that consumer stayed
> alive with 1 message to be acked back. And the consumer was being kept
> alive
> indefinitely (the same consumer id + logs I have posted). I was forced to
> move this message to the DLQ, but the consumer was still slow, idle with
> no.
> of messages to be acked = 1. After some time I was forced to restart
> application node to get new consumer in place.
>
> As I understand aborting correctly, after consumer has been marked as slow
> and it did finish it’s job (even if it was unsuccessful: rollback) it
> should
> be removed and replaced by a different one. *(This behavior is observable
> for “smaller” messages).*
>

Your description of how an abort happens is not quite right.  When a
consumer is determined to be slow and need aborting, the broker sends a
message to the consumer requesting that it abort itself.  Assuming the
consumer processes this message, this will cause the TCP transport to
disconnect, which will be detected by the failover transport, which will
establish a new TCP connection.  But the reconnection won't happen until
after the client has finished processing the current message, in your case
by rolling back the transaction.  The broker doesn't create connections,
and it can't create a new connection for the consumer; the consumer has to
do that, and it will only do it if it processes the disconnect request from
the broker in the first place.  Your client's logs should show a failover
reconnect when this happens, so if you're not seeing that log line in the
client's logs, your client is probably not processing that disconnect
request; I suspect that's what's going on based on your description of the
client staying connected and having the same connection ID.

In 5.9.0, Tim Bish added the ability to abort a connection from the broker
side by closing the socket by using the abortConnection=true URI option;
his blog post (
http://timbish.blogspot.com/2013/07/coming-in-activemq-59-new-way-to-abort.html)
is the best documentation I'm aware of.  If you found that the voluntary
abort wasn't happening, abortConnection=true *might* let your client detect
that the connection is dead and abort sooner, though I haven't played with
it enough to know for sure.  If you're going to use it, you should probably
upgrade to 5.10.0 to get Gary's fix for
https://issues.apache.org/jira/browse/AMQ-5114.

If that doesn't work, you might be able to manually disconnect and
reconnect from within your consumer code once you've finished rolling back
the transaction, but I've never looked into how you'd actually do that so I
don't have suggestions for how to do it or if it would solve the problem...


> So in the logs I should see a different consumer id after some time, but in
> this case it didn’t happen (without restarting server).
>
> 2014-10-25 00:00:11,455 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for
> destination:queue://generateReportQueue
> … more logs every 6 minutes here
> 2014-10-25 01:12:11,641 [host] Scheduler] INFO  AbortSlowConsumerStrategy
> - aborting slow consumer:
> ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for
> destination:queue://generateReportQueue
>
> *Both issues looks kind of the same for me now: consumer hasn’t acked back
> to the server after it failed to consume this HEAVY message (one or more
> times) and it stopped to be responsive (no acked sent back, abort strategy
> couldn’t force it to stop, no new messages dispatched to the consumer). *
>

I agree that both look very similar.  One possible explanation for that is
that ActiveMQ doesn't consider your rollback to be a close-out of the
message, so it doesn't think it can go on to the next message (first case)
or re-establish the connection (second message).  That would be the first
place I'd focus on.  Another possibility is that there's a bug in the
handling of some of these when transactions are manually rolled back, so
that's the next place I'd look.


>
> tbain98 wrote
> > 1.  If I've understood correctly, you say your business logic will abort
> > after 30 minutes, independently of any ActiveMQ-initiated abort request.
> > Is that actually happening?  The logs you've posted don't give any
> > indication either way (and you say "the same idle consumer can’t be
> > aborted
> > in a span of 18 hours"), and the behavior you're describing would be more
> > consistent with your clients not aborting than with them aborting but not
> > pulling the next message, though of course both are possible.  So make
> > sure
> > your client's really doing what you think it is.
>
> Yes, that is really happening, in the application logs I can see that the
> transaction is stopped, and then code which listens to the exceptions in
> all
> MessageListeners sends information about the exception to the devs (custom
> error handler in DefaultMessageListenerContainer). Usually just after
> getting this exception consumer starts to consume the next message or waits
> until redelivery policy kicks in to try to consume the message again.
>
> *As you mentioned it, I will try to verify what is actually happening under
> the hood in the DMLC (we have done some small extensions to it in order to
> define redelivery policy per connection, and custom error handling). Maybe
> when processing is too heavy the rollback isn’t called?*
>

Those sound like the right places to look.  But I'd suggest you dig (with a
debugger, ideally) into what happens in the ActiveMQ client code when you
roll back your transaction.  Having logs from your own code saying you're
doing the rollback isn't proof that it's actually being done, so I'd
confirm that it's really happening, all the way through sending a rollback
notification back to the broker.


> tbain98 wrote
> > 3.  Can you confirm (via JConsole in the MBeans tab or some other JMX
> > viewer) that your consumer is still connected to the broker after the
> > abort?  Also, when your client aborts, how is ActiveMQ being told about
> > the
> > failure?  (And what ack mode are you using?)
>
> Via JMX I saw that consumer was still connected to the server
> We use session transacted (without transaction manager but with some Spring
> JMS magic to handle commit/rollback correctly).
>

> “Also, when your client aborts, how is ActiveMQ being told about the
> failure?”
> I am not sure if I understood your question.
>

I think you just answered it: "with some Spring JMS magic to handle
commit/rollback correctly".  I assume you've confirmed that you're invoking
this magic correctly, but if not it should become clear as you step through
the code with a debugger.


> tbain98 wrote
> > 5 & 6.  For you to use the approach I suggested, you'd either have to be
> > OK
> > losing messages when failures occur or you'd have to persist the message
> > to
> > a datastore to retry in the case of a failure.  It sounds like neither of
> > those is appealing, so this may not be an option.
>
> We use persistent store in ActiveMQ, and I hoped this should be enough. I
> am
> trying to track down the bug in the configuration, to avoid unnecessary
> over
> complicated store code/configuration.
>

Message persistence and how you handle message failures are orthogonal
concerns; the former relates to how you ensure the message gets to your
consumers for successful processing, but doesn't make any guarantees about
what you do once it fails.  Your approach to the latter sounds reasonable;
it just sounds like it's not working properly, but it's well thought-out
and should be a valid approach once you figure out what's failing along the
way.  My proposal was that you store the message on the consumer and have
the consumer invoke retries on failure instead of having the broker do it;
either approach will work, but it sounds like you prefer the former, which
is fine as long as you can track down what's not working right about it.


> Thanks again for your help Tim.
>
>
> I will try to see what’s under the hood in the Spring DMLC when heavy
> message has failed due to transaction timeout and if upgrading to 5.10.0
> solves our issues. I will write when I am done with these two.
>
>
> Regards
> Marek Dominiak
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Not-abortable-slow-consumers-stopped-processing-of-messages-in-a-queue-tp4686721p4686838.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>