AMQ 5.11.1, NPE in KahaAddMessageCommand.getMessage() (JIT suspected)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AMQ 5.11.1, NPE in KahaAddMessageCommand.getMessage() (JIT suspected)

vdmitriev
Hello!

We are running AMQ 5.11.1 on AIX 7.1 with IBM J9 7.0 [1] and have intermittent problem with messages delivery. Everything works fine, consumers are connected to AMQ, but suddenly server stops delivering messages and starts to log this stacktrace.

2017-08-01 01:53:12,913 | ERROR | org.apache.activemq.broker.region.cursors.QueueStorePrefetch@442b1805:xxx,batchResetNeeded=false,size=3,cacheEnabled=false,maxBatchSize:3,hasSpace:true,pendingCachedIds.size:0,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:null,lastAsyncCachedId-seq:null,store=permits:10000,sd=nextSeq:84,lastRet:MessageOrderCursor:[def:81, low:0, high:0],pending:0 - Failed to fill batch | org.apache.activemq.broker.region.cursors.AbstractStoreCursor | ActiveMQ Broker[xxx] Scheduler
java.lang.NullPointerException
        at org.apache.activemq.store.kahadb.data.KahaAddMessageCommandBase.getMessage(KahaAddMessageCommand.java:397)[activemq-kahadb-store-5.11.1.jar:5.11.1]
        at org.apache.activemq.store.kahadb.data.KahaAddMessageCommand.getMessage(KahaAddMessageCommand.java:7)[activemq-kahadb-store-5.11.1.jar:5.11.1]
        at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1131)[activemq-kahadb-store-5.11.1.jar:5.11.1]
        at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$5.execute(KahaDBStore.java:595)[activemq-kahadb-store-5.11.1.jar:5.11.1]
        at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)[activemq-kahadb-store-5.11.1.jar:5.11.1]
        at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:584)[activemq-kahadb-store-5.11.1.jar:5.11.1]
        at org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:106)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:109)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:381)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:142)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:159)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1897)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2119)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.Queue.doBrowse(Queue.java:1116)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.Queue.expireMessages(Queue.java:909)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.Queue.access$100(Queue.java:100)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.broker.region.Queue$2.run(Queue.java:144)[activemq-broker-5.11.1.jar:5.11.1]
        at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)[activemq-client-5.11.1.jar:5.11.1]
        at java.util.TimerThread.mainLoop(Timer.java:566)[:1.7.0]
        at java.util.TimerThread.run(Timer.java:516)[:1.7.0]

I checked the sources (and decompiled binaries from the server just to be sure) and see no valid reason for JVM to throw NPE at KahaAddMessageCommand.java:397: "this" mustn't be null and unboxing is not happening there. I suspect that JIT optimizations are messing the code, but before following this bumpy trail, I wanted to check with AMQ community. Maybe I'm missing something and JVM is behaving correctly? Maybe AMQ is modifying itself somehow at runtime and KahaAddMessageCommand.java:397 starts to point to different sections of code?

[1]
java version "1.7.0"
Java(TM) SE Runtime Environment (build pap6470sr6fp1-20140108_01(SR6 FP1))
IBM J9 VM (build 2.6, JRE 1.7.0 AIX ppc64-64 Compressed References 20140106_181350 (JIT enabled, AOT enabled)
J9VM - R26_Java726_SR6_20140106_1601_B181350
JIT  - r11.b05_20131003_47443.02
GC   - R26_Java726_SR6_20140106_1601_B181350_CMPRSS
J9CL - 20140106_181350)
JCL - 20140103_01 based on Oracle 7u51-b11


~Regards
Vadim
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AMQ 5.11.1, NPE in KahaAddMessageCommand.getMessage() (JIT suspected)

Tim Bain
I'm not aware of any runtime modification of the compiled sources, and I've
never encountered anything that looked like it when attaching with a
debugger, but please post your question to the dev list to be sure.

BTW, I've experienced a similar "impossible" exception under Oracle Hotspot
8u21 that was caused by a JIT bug, so those things do occasionally happen.
Have you tried swapping out your JRE to see if that eliminates the problem?
I'd switch to a non-IBM JRE just to have the best odds of not hitting the
same bug in the new JRE.

Tim

On Aug 1, 2017 4:44 AM, "vdmitriev" <[hidden email]> wrote:

> Hello!
>
> We are running AMQ 5.11.1 on AIX 7.1 with IBM J9 7.0 [1] and have
> intermittent problem with messages delivery. Everything works fine,
> consumers are connected to AMQ, but suddenly server stops delivering
> messages and starts to log this stacktrace.
>
> 2017-08-01 01:53:12,913 | ERROR |
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch@442b1805:
> xxx,batchResetNeeded=false,size=3,cacheEnabled=false,
> maxBatchSize:3,hasSpace:true,pendingCachedIds.size:0,
> lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:null,
> lastAsyncCachedId-seq:null,store=permits:10000,sd=nextSeq:84,lastRet:
> MessageOrderCursor:[def:81,
> low:0, high:0],pending:0 - Failed to fill batch |
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor | ActiveMQ
> Broker[xxx] Scheduler
> java.lang.NullPointerException
>         at
> org.apache.activemq.store.kahadb.data.KahaAddMessageCommandBase.
> getMessage(KahaAddMessageCommand.java:397)[activemq-kahadb-store-5.
> 11.1.jar:5.11.1]
>         at
> org.apache.activemq.store.kahadb.data.KahaAddMessageCommand.getMessage(
> KahaAddMessageCommand.java:7)[activemq-kahadb-store-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:
> 1131)[activemq-kahadb-store-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$5.execute(
> KahaDBStore.java:595)[activemq-kahadb-store-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.store.kahadb.disk.page.Transaction.
> execute(Transaction.java:779)[activemq-kahadb-store-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.
> recoverNextMessages(KahaDBStore.java:584)[activemq-kahadb-store-5.11.1.
> jar:5.11.1]
>         at
> org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(
> ProxyMessageStore.java:106)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(
> QueueStorePrefetch.java:109)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(
> AbstractStoreCursor.java:381)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(
> AbstractStoreCursor.java:142)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(
> StoreQueueCursor.java:159)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.
> java:1897)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.Queue.pageInMessages(
> Queue.java:2119)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.Queue.doBrowse(Queue.
> java:1116)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.Queue.expireMessages(
> Queue.java:909)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.Queue.access$100(Queue.
> java:100)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.broker.region.Queue$2.run(Queue.java:
> 144)[activemq-broker-5.11.1.jar:5.11.1]
>         at
> org.apache.activemq.thread.SchedulerTimerTask.run(
> SchedulerTimerTask.java:33)[activemq-client-5.11.1.jar:5.11.1]
>         at java.util.TimerThread.mainLoop(Timer.java:566)[:1.7.0]
>         at java.util.TimerThread.run(Timer.java:516)[:1.7.0]
>
> I checked the sources (and decompiled binaries from the server just to be
> sure) and see no valid reason for JVM to throw NPE at
> KahaAddMessageCommand.java:397: "this" mustn't be null and unboxing is not
> happening there. I suspect that JIT optimizations are messing the code, but
> before following this bumpy trail, I wanted to check with AMQ community.
> Maybe I'm missing something and JVM is behaving correctly? Maybe AMQ is
> modifying itself somehow at runtime and KahaAddMessageCommand.java:397
> starts to point to different sections of code?
>
> [1]
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build pap6470sr6fp1-20140108_01(SR6 FP1))
> IBM J9 VM (build 2.6, JRE 1.7.0 AIX ppc64-64 Compressed References
> 20140106_181350 (JIT enabled, AOT enabled)
> J9VM - R26_Java726_SR6_20140106_1601_B181350
> JIT  - r11.b05_20131003_47443.02
> GC   - R26_Java726_SR6_20140106_1601_B181350_CMPRSS
> J9CL - 20140106_181350)
> JCL - 20140103_01 based on Oracle 7u51-b11
>
>
> ~Regards
> Vadim
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/AMQ-5-11-1-NPE-in-KahaAddMessageCommand-
> getMessage-JIT-suspected-tp4729109.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AMQ 5.11.1, NPE in KahaAddMessageCommand.getMessage() (JIT suspected)

vdmitriev
Hi, Tim!

> I'm not aware of any runtime modification of the compiled sources, and I've
> never encountered anything that looked like it when attaching with a
> debugger, but please post your question to the dev list to be sure.

Ok, done.

> BTW, I've experienced a similar "impossible" exception under Oracle Hotspot
> 8u21 that was caused by a JIT bug, so those things do occasionally happen.
> Have you tried swapping out your JRE to see if that eliminates the problem?
> I'd switch to a non-IBM JRE just to have the best odds of not hitting the
> same bug in the new JRE.

I'm now aware of any JVM which is compatible with AIX OS other than IBM. Anyway, customer has nightmarish bureaucratic procedure to update anything on production environment, so I'm stuck with AMQ 5.11.1 and 3 years old JVM for the foreseeable future. Unless I provide significant reason to make changes. And I really count on encountering JIT compiler issue now to force JVM upgrade.

Thanks for sharing your ideas!

~Regards
Vadim
Loading...