Reliability issues with Apollo?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view

Reliability issues with Apollo?

I am testing with Apollo 1.7.1 running with Java 1.8.0_45 on Windows 7.  I am using Apache.NMS with a .Net producer and consumer app.  The producer publishes to topic FOO.BAR, and the consumer is subscribing to durable subscription rizzuto:test.  Here is the config from the virtual host in the apollo.xml:

  <virtual_host id="RaysBroker" purge_on_startup="true">

    <access_rule allow="users" action="connect create destroy send receive consume"/>
    <leveldb_store directory="${apollo.base}/data"/> 
  <topic id="FOO.BAR" auto_delete_after="0">

  <dsub id="*:test" topic="FOO.BAR" quota="2gb" full_policy="drop head" />


When I run the producer and consumer, I have no issues, and have run that for about an hour sending ~20K msg/sec on a PC with a 1st gen i7.

If I kill and restart the consumer, the consumer starts back up from where it left off, however after a while the consumer stops receiving messages, and I start getting various warning on the broker console.  I have seen all of the following at different times:

WARN  | Could not snapshot the index: c:\Users\rizzuto\Downloads\RaysBroker\data\dirty.index\000097.log (The system cannot find the file specified)
WARN  | Could not snapshot the index: link failed
WARN  | DB operation failed. (entering recovery mode): File 'c:\Users\rizzuto\Downloads\RaysBroker\data\0000000000000000.log' offset: 1865684455
WARN  | java.lang.Error: File 'c:\Users\rizzuto\Downloads\RaysBroker\data\0000000000000000.log' offset: 1871425261
WARN  | java.lang.AssertionError: Node is not linked
WARN  | Queue 'rizzuto:test' detected store dropped message at seq: 734760

My durable subscription is set with a 2gb quota, and to delete from the head if the quota is reached.  

Any suggestions on why this is happening or how to debug it?
Reply | Threaded
Open this post in threaded view

Re: Reliability issues with Apollo?

Also I get:

WARN  | java.lang.NullPointerException

I think all the issue seems to happen when the producer is producing faster than the consumer can consume.  Initially the consumer keeps up, but when I kill and restart the consumer, it slows down, likely since it is consuming from the persistent files, and that is slower.

For my application, I am fine with losing older messages, which is why I have the "drop head" policy.  Having the broker become unreliable is a big issue, however.