[jira] [Commented] (AMQ-5712) Broker can deadlock when using queues while producers wait on disk space

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (AMQ-5712) Broker can deadlock when using queues while producers wait on disk space

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/AMQ-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492612#comment-14492612 ]

Christopher L. Shannon commented on AMQ-5712:
---------------------------------------------

Timothy,

I just pushed up my updated [pull request|https://github.com/apache/activemq/pull/84/] with a unit test.  If you run the test before my patch, it will timeout because of a deadlock.  With my patch applied it completes successfully.  The key in this test to making it deadlock is the issue only occurs when the temporary storage size is configured to be less than the memory storage size.  (such as 4 gig of memory storage and 1 gig of temp storage).  While this configuration probably isn't too likely, it's still a possibility.  When memory fills up and it is dumped to disk, the deadlock occurs and the queue can not be purged.  My unit test demonstrates this.

Also, I still think there other conditions that cause the broker to deadlock in the same spot (in one of our servers this has happened and the temporary storage was greater than memory size) but this unit test was the easiest way I could reproduce the problem reliably.

Let me know what you think,
Chris



> Broker can deadlock when using queues while producers wait on disk space
> ------------------------------------------------------------------------
>
>                 Key: AMQ-5712
>                 URL: https://issues.apache.org/jira/browse/AMQ-5712
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.11.1
>            Reporter: Christopher L. Shannon
>
> I am experiencing a deadlock when using a Queue with non-persistent messages.  The queue has a cursor high memory water mark set (right now at 70%).  When a producer is producing messages quickly to the queue and that limit gets hit, the broker can deadlock.   I have tried setting producerWindowSize and alwaysSyncSend which did not seem to help. When the broker hits that limit, I am unable to do things like purge the queue.  Consumers can also deadlock as well.
> Note that this appears to be the same issue as described in this ticket here: AMQ-2475 .  The difference is that I am using a Queue and not a Topic and the fix for this appears to only have been for Topics.
> The problem appears to be in the Queue class on line 1852 inside the {{cursorAdd}} method.  The method being called is {{return messages.addMessageLast(msg);}} which will block indefinitely if there is no space available, which in turn ties up the {{messagesLock}} from being used by any other threads.  We have seen a deadlock where consumers can't consume because they are waiting on this lock.   It looks like in AMQ-2475 part of the fix was to replace {{messages.addMessageLast(msg)}} with {{messages.tryAddMessageLast(msg, 10)}}.  I also noticed that not all of the message cursors support {{tryAddMessageLast}}, which could be a problem.  {{FilePendingMessageCursor}} implements it but the rest of the cursors (notably {{StoreQueueCursor}}) simply delegate back to {{addMessageLast}} in the parent class.  So part of this fix may require implementing {{tryAddMessageLast}} across more cursors.
> Here is part of the thread dump showing the stuck producer:
> {code}
> "ActiveMQ Transport: ssl:///192.168.3.142:38589" daemon prio=10 tid=0x00007fb46c006000 nid=0x3b1a runnable [0x00007fb4b8a0d000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000cfb13cd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2176)
>         at org.apache.activemq.usage.Usage.waitForSpace(Usage.java:103)
>         at org.apache.activemq.usage.Usage.waitForSpace(Usage.java:90)
>         at org.apache.activemq.usage.Usage.waitForSpace(Usage.java:80)
>         at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.tryAddMessageLast(FilePendingMessageCursor.java:235)
>         - locked <0x00000000d2015ee0> (a org.apache.activemq.broker.region.cursors.FilePendingMessageCursor)
>         at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.addMessageLast(FilePendingMessageCursor.java:207)
>         - locked <0x00000000d2015ee0> (a org.apache.activemq.broker.region.cursors.FilePendingMessageCursor)
>         at org.apache.activemq.broker.region.cursors.StoreQueueCursor.addMessageLast(StoreQueueCursor.java:97)
>         - locked <0x00000000d1f20908> (a org.apache.activemq.broker.region.cursors.StoreQueueCursor)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)