Using ActiveMQ For Distributed Replicated Cache

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Using ActiveMQ For Distributed Replicated Cache

pragmaticjdev
We plan to use activemq to build a replicated cache in our distributed system
consisting of multiple java applications. In this deployment, an update to
any of the objects being cached in the jvm memory of the app server acts as
a producer which pushes the pojo to the jms topic. All other app servers
subscribe to it and update the object that is cached by them. Here's a quick
deployment screenshot with 2 app servers. In production we would have a
maximum of 5-7 such app servers.

<http://activemq.2283324.n4.nabble.com/file/t378744/ActiveMQ_Oracle.png>

I would like to get suggestions on below queries
        1. Given this is a real time use case (caching) we would want any updates
to be replicated to all other app servers as soon as possible. What broker
configurations should we consider to make sure that the messages are
delivered in real time in all possible flows?
        2. Would it be good to maintain a continuous connection with the activemq
server on each app server in order to invalidate the cache in case the app
server fails to connect to activemq? If so, how could it be implemented?
        3. How do we handle failure scenarios when the producer fails to push an
object to the topic (in which case we might have to rollback the database
transaction) or the consumer fails to receive the java object?

Looking forward for some insights.

Thanks.




--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

Tim Bain
On Wed, Apr 11, 2018, 6:09 AM pragmaticjdev <[hidden email]> wrote:

> We plan to use activemq to build a replicated cache in our distributed
> system
> consisting of multiple java applications. In this deployment, an update to
> any of the objects being cached in the jvm memory of the app server acts as
> a producer which pushes the pojo to the jms topic. All other app servers
> subscribe to it and update the object that is cached by them. Here's a
> quick
> deployment screenshot with 2 app servers. In production we would have a
> maximum of 5-7 such app servers.
>
> <http://activemq.2283324.n4.nabble.com/file/t378744/ActiveMQ_Oracle.png>
>


Have you considered using an actual standalone caching product such as
Redis or MemCache as your cache rather than trying to create your own
synchronized distributed in-memory cache? It seems like that would simplify
your task significantly and reduce the risk of bugs and the need for
troubleshooting. Reusing an existing product is typically better than
re-inventing the wheel...


I would like to get suggestions on below queries
>         1. Given this is a real time use case (caching) we would want any
> updates
> to be replicated to all other app servers as soon as possible. What broker
> configurations should we consider to make sure that the messages are
> delivered in real time in all possible flows?
>


The default settings should be fine.


        2. Would it be good to maintain a continuous connection with the
> activemq
> server on each app server in order to invalidate the cache in case the app
> server fails to connect to activemq? If so, how could it be implemented?
>


If you use the TCP transport, it will stay connected continuously, and then
disconnect if the broker goes down. If that happens, I believe an exception
will be thrown the next time your code polls for new messages.


        3. How do we handle failure scenarios when the producer fails to
> push an
> object to the topic (in which case we might have to rollback the database
> transaction) or the consumer fails to receive the java object?
>


Is the work that leads to the message being published done in response to
the consumption of a JMS message? If so, you can use an XA transaction to
roll back your database commit if the send fails. But if you're not
consuming a message at the time, I'm not sure that the built-in XA
transactions are available to you. But I'm not an expert on transactions in
a JMS context, so I could be mistaken about that. But you'd have to add
special logic to remove the cache entry on transaction rollback.

Again, I'd use a standalone cache for this rather than creating your own,
if at all possible.

Tim

>
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

jbertram
I agree with Tim about using an existing cache implementation. It seems
like a lot of work to re-invent the wheel here. If you're looking for an
open-source, Java-based cache implementation which uses the ASL 2.0 (e.g.
similar to Apache ActiveMQ) checkout in Infinispan [1].


Justin

[1] http://infinispan.org/

On Tue, Apr 17, 2018 at 8:34 AM, Tim Bain <[hidden email]> wrote:

> On Wed, Apr 11, 2018, 6:09 AM pragmaticjdev <[hidden email]> wrote:
>
> > We plan to use activemq to build a replicated cache in our distributed
> > system
> > consisting of multiple java applications. In this deployment, an update
> to
> > any of the objects being cached in the jvm memory of the app server acts
> as
> > a producer which pushes the pojo to the jms topic. All other app servers
> > subscribe to it and update the object that is cached by them. Here's a
> > quick
> > deployment screenshot with 2 app servers. In production we would have a
> > maximum of 5-7 such app servers.
> >
> > <http://activemq.2283324.n4.nabble.com/file/t378744/ActiveMQ_Oracle.png>
> >
>
>
> Have you considered using an actual standalone caching product such as
> Redis or MemCache as your cache rather than trying to create your own
> synchronized distributed in-memory cache? It seems like that would simplify
> your task significantly and reduce the risk of bugs and the need for
> troubleshooting. Reusing an existing product is typically better than
> re-inventing the wheel...
>
>
> I would like to get suggestions on below queries
> >         1. Given this is a real time use case (caching) we would want any
> > updates
> > to be replicated to all other app servers as soon as possible. What
> broker
> > configurations should we consider to make sure that the messages are
> > delivered in real time in all possible flows?
> >
>
>
> The default settings should be fine.
>
>
>         2. Would it be good to maintain a continuous connection with the
> > activemq
> > server on each app server in order to invalidate the cache in case the
> app
> > server fails to connect to activemq? If so, how could it be implemented?
> >
>
>
> If you use the TCP transport, it will stay connected continuously, and then
> disconnect if the broker goes down. If that happens, I believe an exception
> will be thrown the next time your code polls for new messages.
>
>
>         3. How do we handle failure scenarios when the producer fails to
> > push an
> > object to the topic (in which case we might have to rollback the database
> > transaction) or the consumer fails to receive the java object?
> >
>
>
> Is the work that leads to the message being published done in response to
> the consumption of a JMS message? If so, you can use an XA transaction to
> roll back your database commit if the send fails. But if you're not
> consuming a message at the time, I'm not sure that the built-in XA
> transactions are available to you. But I'm not an expert on transactions in
> a JMS context, so I could be mistaken about that. But you'd have to add
> special logic to remove the cache entry on transaction rollback.
>
> Again, I'd use a standalone cache for this rather than creating your own,
> if at all possible.
>
> Tim
>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

MichaelAndrePearce
Apache Ignite (is ASF product in this space)


Saying that if you want to do this, and not have consistency issue, if you design it so you publish the change event into the topic first and your app (including the instance that produces) consumes the data to update their local cache .

This way the update is atomic across all your nodes including the produce, then if it fails all caches won’t see the change including the local.




> On 17 Apr 2018, at 17:40, Justin Bertram <[hidden email]> wrote:
>
> I agree with Tim about using an existing cache implementation. It seems
> like a lot of work to re-invent the wheel here. If you're looking for an
> open-source, Java-based cache implementation which uses the ASL 2.0 (e.g.
> similar to Apache ActiveMQ) checkout in Infinispan [1].
>
>
> Justin
>
> [1] http://infinispan.org/
>
>> On Tue, Apr 17, 2018 at 8:34 AM, Tim Bain <[hidden email]> wrote:
>>
>>> On Wed, Apr 11, 2018, 6:09 AM pragmaticjdev <[hidden email]> wrote:
>>>
>>> We plan to use activemq to build a replicated cache in our distributed
>>> system
>>> consisting of multiple java applications. In this deployment, an update
>> to
>>> any of the objects being cached in the jvm memory of the app server acts
>> as
>>> a producer which pushes the pojo to the jms topic. All other app servers
>>> subscribe to it and update the object that is cached by them. Here's a
>>> quick
>>> deployment screenshot with 2 app servers. In production we would have a
>>> maximum of 5-7 such app servers.
>>>
>>> <http://activemq.2283324.n4.nabble.com/file/t378744/ActiveMQ_Oracle.png>
>>>
>>
>>
>> Have you considered using an actual standalone caching product such as
>> Redis or MemCache as your cache rather than trying to create your own
>> synchronized distributed in-memory cache? It seems like that would simplify
>> your task significantly and reduce the risk of bugs and the need for
>> troubleshooting. Reusing an existing product is typically better than
>> re-inventing the wheel...
>>
>>
>> I would like to get suggestions on below queries
>>>        1. Given this is a real time use case (caching) we would want any
>>> updates
>>> to be replicated to all other app servers as soon as possible. What
>> broker
>>> configurations should we consider to make sure that the messages are
>>> delivered in real time in all possible flows?
>>>
>>
>>
>> The default settings should be fine.
>>
>>
>>        2. Would it be good to maintain a continuous connection with the
>>> activemq
>>> server on each app server in order to invalidate the cache in case the
>> app
>>> server fails to connect to activemq? If so, how could it be implemented?
>>>
>>
>>
>> If you use the TCP transport, it will stay connected continuously, and then
>> disconnect if the broker goes down. If that happens, I believe an exception
>> will be thrown the next time your code polls for new messages.
>>
>>
>>        3. How do we handle failure scenarios when the producer fails to
>>> push an
>>> object to the topic (in which case we might have to rollback the database
>>> transaction) or the consumer fails to receive the java object?
>>>
>>
>>
>> Is the work that leads to the message being published done in response to
>> the consumption of a JMS message? If so, you can use an XA transaction to
>> roll back your database commit if the send fails. But if you're not
>> consuming a message at the time, I'm not sure that the built-in XA
>> transactions are available to you. But I'm not an expert on transactions in
>> a JMS context, so I could be mistaken about that. But you'd have to add
>> special logic to remove the cache entry on transaction rollback.
>>
>> Again, I'd use a standalone cache for this rather than creating your own,
>> if at all possible.
>>
>> Tim
>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

Tim Bain
I wouldn't say it's atomic, it just shifts around some of the error cases
(you risk reading uncommitted writes that might be rolled back, instead of
risking having a cache miss when a value should exist).

In general, the terms "atomic" and "distributed" are contradictory in most
situations. If the OP is using this as a cache (i.e. a place to hold the
results of operations that you would otherwise be able to compute on
demand, as a performance optimization), he will be fine, and atomicity
doesn't matter. If on the other hand the OP requires that the "cache" stay
fully in sync and it's not possible to compute the value in the case of a
cache miss, then that's a database not a cache.

On reflection, I believe that the OP is looking for a database here, and
that using a SQL-based database that supports XA transactions might allow
him to support committing or rolling back the real-work transaction if the
write to the "cache" database fails. So I retract my earlier suggestion to
use a standalone cache, and instead I recommend using a standalone RDBMS
that supports XA transactions.

Tim

On Tue, Apr 17, 2018, 10:50 AM Michael André Pearce <
[hidden email]> wrote:

> Apache Ignite (is ASF product in this space)
>
>
> Saying that if you want to do this, and not have consistency issue, if you
> design it so you publish the change event into the topic first and your app
> (including the instance that produces) consumes the data to update their
> local cache .
>
> This way the update is atomic across all your nodes including the produce,
> then if it fails all caches won’t see the change including the local.
>
>
>
>
> > On 17 Apr 2018, at 17:40, Justin Bertram <[hidden email]> wrote:
> >
> > I agree with Tim about using an existing cache implementation. It seems
> > like a lot of work to re-invent the wheel here. If you're looking for an
> > open-source, Java-based cache implementation which uses the ASL 2.0 (e.g.
> > similar to Apache ActiveMQ) checkout in Infinispan [1].
> >
> >
> > Justin
> >
> > [1] http://infinispan.org/
> >
> >> On Tue, Apr 17, 2018 at 8:34 AM, Tim Bain <[hidden email]>
> wrote:
> >>
> >>> On Wed, Apr 11, 2018, 6:09 AM pragmaticjdev <[hidden email]>
> wrote:
> >>>
> >>> We plan to use activemq to build a replicated cache in our distributed
> >>> system
> >>> consisting of multiple java applications. In this deployment, an update
> >> to
> >>> any of the objects being cached in the jvm memory of the app server
> acts
> >> as
> >>> a producer which pushes the pojo to the jms topic. All other app
> servers
> >>> subscribe to it and update the object that is cached by them. Here's a
> >>> quick
> >>> deployment screenshot with 2 app servers. In production we would have a
> >>> maximum of 5-7 such app servers.
> >>>
> >>> <
> http://activemq.2283324.n4.nabble.com/file/t378744/ActiveMQ_Oracle.png>
> >>>
> >>
> >>
> >> Have you considered using an actual standalone caching product such as
> >> Redis or MemCache as your cache rather than trying to create your own
> >> synchronized distributed in-memory cache? It seems like that would
> simplify
> >> your task significantly and reduce the risk of bugs and the need for
> >> troubleshooting. Reusing an existing product is typically better than
> >> re-inventing the wheel...
> >>
> >>
> >> I would like to get suggestions on below queries
> >>>        1. Given this is a real time use case (caching) we would want
> any
> >>> updates
> >>> to be replicated to all other app servers as soon as possible. What
> >> broker
> >>> configurations should we consider to make sure that the messages are
> >>> delivered in real time in all possible flows?
> >>>
> >>
> >>
> >> The default settings should be fine.
> >>
> >>
> >>        2. Would it be good to maintain a continuous connection with the
> >>> activemq
> >>> server on each app server in order to invalidate the cache in case the
> >> app
> >>> server fails to connect to activemq? If so, how could it be
> implemented?
> >>>
> >>
> >>
> >> If you use the TCP transport, it will stay connected continuously, and
> then
> >> disconnect if the broker goes down. If that happens, I believe an
> exception
> >> will be thrown the next time your code polls for new messages.
> >>
> >>
> >>        3. How do we handle failure scenarios when the producer fails to
> >>> push an
> >>> object to the topic (in which case we might have to rollback the
> database
> >>> transaction) or the consumer fails to receive the java object?
> >>>
> >>
> >>
> >> Is the work that leads to the message being published done in response
> to
> >> the consumption of a JMS message? If so, you can use an XA transaction
> to
> >> roll back your database commit if the send fails. But if you're not
> >> consuming a message at the time, I'm not sure that the built-in XA
> >> transactions are available to you. But I'm not an expert on
> transactions in
> >> a JMS context, so I could be mistaken about that. But you'd have to add
> >> special logic to remove the cache entry on transaction rollback.
> >>
> >> Again, I'd use a standalone cache for this rather than creating your
> own,
> >> if at all possible.
> >>
> >> Tim
> >>
> >>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

pragmaticjdev
Thanks for the replies.

Below are my comments on the discussed topics

> Have you considered using an actual standalone caching product such as
> Redis or MemCache as your cache rather than trying to create your own
> synchronized distributed in-memory cache?

That was the first thought but since we are trying to build a cache with
activemq mainly because
1. We need this caching capability only for a couple of features in our
application out of say 30 features it has. Adding another component in the
architecture just for this doesn't sound worthwhile from an operations
perspective
2. We already have activemq in our architecture hence we wanted to use if we
can reuse that aspect.


> If you use the TCP transport, it will stay connected continuously, and
> then
> disconnect if the broker goes down. If that happens, I believe an
> exception
> will be thrown the next time your code polls for new messages.

If the exception gets thrown only when the subscriber polls then the cache
could be stale. Is there something that notifies (through an exception?)
when the broker goes down?


> In general, the terms "atomic" and "distributed" are contradictory in most
> situations. If the OP is using this as a cache (i.e. a place to hold the
> results of operations that you would otherwise be able to compute on
> demand, as a performance optimization), he will be fine, and atomicity
> doesn't matter. If on the other hand the OP requires that the "cache" stay
> fully in sync and it's not possible to compute the value in the case of a
> cache miss, then that's a database not a cache.

Our application has a OLTP use case. The objects that we want to cache are
metadata objects of the application. They don't change frequently after the
initial configuration. Hence the thought of having a replicated cache. I
came across  this discussion
<http://activemq.2283324.n4.nabble.com/How-to-ensure-reliability-of-publish-subscribe-against-occassional-network-hiccup-td2354084.html>  
from the old threads which looked exactly similar to what I am trying to do.
I can sequence the writes such that we first write to the database (RDBMS)
and then publish it on JMS. If that fails we can rollback the database
transaction. We look good on the publisher side.
We can even guarantee that the subscribers receive the copy with
CLIENT_ACKNOWLEDGE (activemq would retry till everyone reads the copy) but
the two situations are something that I cannot find an answer to

1. How should our application handle a failure flow in which the publisher
pushes a message on the topic successfully but activemq goes down before
sending that message to any of the subscribers (assuming non-durable
subscribers)? Is there a way the subscribers could know that activemq is
down? If so, we could invalidate the whole cache.

2. Given these cache updates need to be published & subscribed in real time,
what special handling should we do in cases the subscriber is busy
(overloaded) and unable to read the message from the topic?

Thanks!




--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

Tim Bain-2
I think you need to figure out whether you're trying to build a distributed
cache or a distributed database, because right now you seem to be trying to
build something halfway between the two and it's leading to an inconsistent
set of requirements.

If you're trying to build a distributed cache, then you can populate the
cache from a truth store (e.g. a RDBMS) whenever there is a cache miss.
This implies that it's perfectly safe to clear the cache when you can't
talk to the ActiveMQ broker, and that it's fine to bring up a new node
(either to scale up or to replace one that fails) because it has a way to
restore its cache and get in sync with the rest of the nodes. In this case,
you'd not want to rollback the transaction in the case of failure to
publish to the ActiveMQ broker, because the cache in each node would simply
be filled by a query on a cache miss. And you would only need to clear the
cache when you encounter broker connection problems if your data contains
updates or deletions to existing records; it it's a stream of new records,
then what's in the cache is fine to keep, and you can fill it up with the
records you don't get via ActiveMQ by querying the truth store when you
encounter cache misses.

If on the other hand your trying to build a distributed database, then you
do need to perform a rollback when you fail to publish, but you also need a
way to populate a node when it starts (empty). And if you're going to clear
the local copy of the database, then you're going to lose all your data
whenever all nodes simultaneously lose their connection to the broker, so
you probably don't want to do that. Instead you'd want to keep all records,
and then have a mechanism for nodes synchronizing between themselves when
connectivity is restored; this would also handle the new-node problem. And
if this is what you're looking for, please stop calling it a cache.

There's a lot more complexity in the second case than in the first, and my
strong recommendation is to do the first (only a cache, with a backing
database). I know that you think that fewer deployed components makes a
simpler solution, but I think that simpler/less code, with less development
effort and less risk of bugs, is actually the simpler solution even if it
requires an additional component (an RDBMS). And if you already have an
RDBMS in your architecture (otherwise, how are you rolling back your
transaction when you fail to publish to ActiveMQ?), then the distributed
cache really is simpler than the distributed database.

Additional responses are inline.

Tim


On Apr 19, 2018 8:00 AM, "pragmaticjdev" <[hidden email]> wrote:

> If you use the TCP transport, it will stay connected continuously, and
> then
> disconnect if the broker goes down. If that happens, I believe an
> exception
> will be thrown the next time your code polls for new messages.

If the exception gets thrown only when the subscriber polls then the cache
could be stale. Is there something that notifies (through an exception?)
when the broker goes down?



Typically you would call it in a tight loop, so you're only as stale as the
amount of time it takes you to publish the messages received the last time.


> In general, the terms "atomic" and "distributed" are contradictory in
most
> situations. If the OP is using this as a cache (i.e. a place to hold the
> results of operations that you would otherwise be able to compute on
> demand, as a performance optimization), he will be fine, and atomicity
> doesn't matter. If on the other hand the OP requires that the "cache"
stay
> fully in sync and it's not possible to compute the value in the case of a
> cache miss, then that's a database not a cache.

Our application has a OLTP use case. The objects that we want to cache are
metadata objects of the application. They don't change frequently after the
initial configuration. Hence the thought of having a replicated cache. I

came across  this discussion
<
http://activemq.2283324.n4.nabble.com/How-to-ensure-reliability-of-publish-subscribe-against-occassional-network-hiccup-td2354084.html>

from the old threads which looked exactly similar to what I am trying to
do.
I can sequence the writes such that we first write to the database (RDBMS)
and then publish it on JMS. If that fails we can rollback the database
transaction. We look good on the publisher side.
We can even guarantee that the subscribers receive the copy with
CLIENT_ACKNOWLEDGE (activemq would retry till everyone reads the copy) but
the two situations are something that I cannot find an answer to

1. How should our application handle a failure flow in which the publisher
pushes a message on the topic successfully but activemq goes down before
sending that message to any of the subscribers (assuming non-durable
subscribers)? Is there a way the subscribers could know that activemq is
down? If so, we could invalidate the whole cache.



See my answer above.


2. Given these cache updates need to be published & subscribed in real time,
what special handling should we do in cases the subscriber is busy
(overloaded) and unable to read the message from the topic?



If you're going with a distributed database, there's only one answer here:
ensure that this doesn't happen. Make the insertion of a record dead simple
to minimize the work done on that thread, make sure the process is hosted
on a machine with more than enough processing power, even raise the
priority of the ActiveMQ consumer thread to ensure that a glut of "real"
work doesn't starve the consumer.

If you're going with a distributed cache, then don't worry about this,
because you'll handle it with queries to the truth store when you have
cache misses (at the cost of slower performance).

Tim
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

pragmaticjdev
Highly appreciate the detailed replies and the clarifications on distributed
cache vs database.
We are trying to build a distributed cache. I agree to all the inputs you
shared for such a cache implementation. In summary it would mean
        1. Subscriber should clear the cache when it cannot connect to the broker
        2. Publisher should not rollback the database transaction on failures as
step #1 would be sufficient & the cache is loaded as and when queried

A couple of follow up questions
1.

> Typically you would call it in a tight loop, so you're only as stale as
> the amount of time it takes you to publish the messages received the last
> time. 

       
How can one control the polling time of the consumer? My jms consumer code
from our spring application looks like this

@Component
public class Consumer {

          @JmsListener(destination = "java-topic", containerFactory =
"topicListenerFactory")
        public void receiveTopicMessage(@Payload Person person) throws JMSException
{
            //update the local cache entry
           }
}

How do I change the above code to call it in a tight loop? Also would that
mean one or more threads would be constantly busy leading to constant usage
of CPU cycles?

2.
For my question on overloaded subscriber I didn't completely follow your
suggestion for not being worried about this scenario. You mentioned


>  If you're going with a distributed cache, then don't worry about
> this, because you'll handle it with queries to the truth store when you
> have cache misses (at the cost of slower performance). 

Assume there are two app servers with an object loaded in the local cache.
An update to this object occurs on app server 1 which publishes that object
on the jms queue. Here if app server 2 is overloaded (busy CPU), the jms
consumer thread might not get a chance to execute at that instance in time.
What happens in such cases, does activemq retry after some time? Can the
number of such retries be configured? It could so happen that the app server
2 could remain in an overloaded state for a longer duration (may be 30
mins).



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

pragmaticjdev
Any thoughts on how to control the polling interval and other queries
mentioned below?

Thanks!



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

Tim Bain
In reply to this post by pragmaticjdev
On Mon, Apr 23, 2018 at 11:48 PM, pragmaticjdev <[hidden email]> wrote:

> Highly appreciate the detailed replies and the clarifications on
> distributed
> cache vs database.
> We are trying to build a distributed cache. I agree to all the inputs you
> shared for such a cache implementation. In summary it would mean
>         1. Subscriber should clear the cache when it cannot connect to the
> broker
>         2. Publisher should not rollback the database transaction on
> failures as
> step #1 would be sufficient & the cache is loaded as and when queried
>


OK, if you're building a cache rather than a database, what serves as the
record copy of the data, from which the cache can be queried on demand?
Would you just query your OLTP database whenever you have a cache miss?

Also, my earlier responses weren't considering the possibility that you'd
be updating objects within the cache. In that case, I agree that clearing
the cache on disconnection/error is the right way to go. Sorry if that
caused any confusion.



> A couple of follow up questions
> 1.
>
> > Typically you would call it in a tight loop, so you're only as stale as
> > the amount of time it takes you to publish the messages received the last
> > time.
>
>
> How can one control the polling time of the consumer? My jms consumer code
> from our spring application looks like this
>
> @Component
> public class Consumer {
>
>           @JmsListener(destination = "java-topic", containerFactory =
> "topicListenerFactory")
>         public void receiveTopicMessage(@Payload Person person) throws
> JMSException
> {
>             //update the local cache entry
>            }
> }
>
> How do I change the above code to call it in a tight loop?



Ah, so you're using message-driven code rather than managing the connection
yourself. In that case you'd want to do the following to handle the error
(by clearing the cache):
https://stackoverflow.com/questions/40654586/spring-jms-set-errorhandler-for-jmslistener-annotated-method

You certainly could switch to explicitly managing the connection (see
http://activemq.apache.org/hello-world.html as an example of what that code
would look like), but that's not necessary if you'd rather use the
message-driven paradigm.



> Also would that
> mean one or more threads would be constantly busy leading to constant usage
> of CPU cycles?
>


If you were to switch to accessing the connection directly, you'd typically
include a small Thread.sleep() to prevent spin-waiting. I apologize if the
choice of the words "tight loop" implied spin-waiting; I just meant that
you would keep the sleeps relatively short, not that there wouldn't be any
at all.



> 2.
> For my question on overloaded subscriber I didn't completely follow your
> suggestion for not being worried about this scenario. You mentioned
>
>
> >  If you're going with a distributed cache, then don't worry about
> > this, because you'll handle it with queries to the truth store when you
> > have cache misses (at the cost of slower performance).
>
> Assume there are two app servers with an object loaded in the local cache.
> An update to this object occurs on app server 1 which publishes that object
> on the jms queue. Here if app server 2 is overloaded (busy CPU), the jms
> consumer thread might not get a chance to execute at that instance in time.
> What happens in such cases, does activemq retry after some time?



In that scenario, your fate is in the hands of the JRE's thread scheduler.
There's no retrying at the application level; the thread simply sits there
with its execution pointer set to the next operation to be done, but it
might take time (milliseconds, not minutes) until the JRE decides that this
particular thread should be allowed to run its operations.

With that said, if the correct operation of your system depends on the
cache being updated before the subsequent operation is evaluated (i.e.
multi-process synchronization), then an asynchronous cache based on
ActiveMQ is not what you want, and you need to be hitting a
(non-distributed) database such as a SQL-based RDBMS for all of your
operations. Distributed systems have a certain amount of unpredictability
for operations that are run in parallel, so if your use case can't support
this, then you need an external data store such as an RDBMS to enforce
ordering/synchronization. I haven't been able to tell from what you've
written if this is your situation or not, but make sure you're clear on the
tradeoffs of a distributed/parallel architecture like the one you're
proposing, and make sure you can accept those tradeoffs.



> Can the
> number of such retries be configured? It could so happen that the app
> server
> 2 could remain in an overloaded state for a longer duration (may be 30
> mins).
>


Being "overloaded" doesn't mean your threads won't get to run, unless
you've done something dumb like setting that thread's priority lower than
the other threads in the application. In the default configuration, all
threads will be scheduled more or less evenly, so they'll make progress,
just not as fast as they would if the box was idle. There's nothing to
worry about here, unless you can't accept the inherent unpredictability of
a distributed system (see previous paragraphs).

Tim
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

Tim Bain
Oh, one other thing. @JmsListener is typically used with a
DefaultMessageListenerContainer, which interacts poorly with topics if the
subscription is non-durable because the DMLC typically disconnects and
reconnects for every message, allowing messages to be "lost" for
non-durable subscriptions. See
https://stackoverflow.com/questions/42300953/jmslistener-issue-with-durable-subscription
for some information about how to make your subscription durable. But make
sure you have a plan for what happens when consumers shut down and never
return (e.g. because you permanently remove a host), since durable
subscriptions without a consumer will accumulate messages forever.

Tim

On Thu, Apr 26, 2018 at 11:28 PM, Tim Bain <[hidden email]> wrote:

>
>
> On Mon, Apr 23, 2018 at 11:48 PM, pragmaticjdev <[hidden email]>
> wrote:
>
>> Highly appreciate the detailed replies and the clarifications on
>> distributed
>> cache vs database.
>> We are trying to build a distributed cache. I agree to all the inputs you
>> shared for such a cache implementation. In summary it would mean
>>         1. Subscriber should clear the cache when it cannot connect to
>> the broker
>>         2. Publisher should not rollback the database transaction on
>> failures as
>> step #1 would be sufficient & the cache is loaded as and when queried
>>
>
>
> OK, if you're building a cache rather than a database, what serves as the
> record copy of the data, from which the cache can be queried on demand?
> Would you just query your OLTP database whenever you have a cache miss?
>
> Also, my earlier responses weren't considering the possibility that you'd
> be updating objects within the cache. In that case, I agree that clearing
> the cache on disconnection/error is the right way to go. Sorry if that
> caused any confusion.
>
>
>
>> A couple of follow up questions
>> 1.
>>
>> > Typically you would call it in a tight loop, so you're only as stale as
>> > the amount of time it takes you to publish the messages received the
>> last
>> > time.
>>
>>
>> How can one control the polling time of the consumer? My jms consumer code
>> from our spring application looks like this
>>
>> @Component
>> public class Consumer {
>>
>>           @JmsListener(destination = "java-topic", containerFactory =
>> "topicListenerFactory")
>>         public void receiveTopicMessage(@Payload Person person) throws
>> JMSException
>> {
>>             //update the local cache entry
>>            }
>> }
>>
>> How do I change the above code to call it in a tight loop?
>
>
>
> Ah, so you're using message-driven code rather than managing the
> connection yourself. In that case you'd want to do the following to handle
> the error (by clearing the cache): https://stackoverflow.com/
> questions/40654586/spring-jms-set-errorhandler-for-
> jmslistener-annotated-method
>
> You certainly could switch to explicitly managing the connection (see
> http://activemq.apache.org/hello-world.html as an example of what that
> code would look like), but that's not necessary if you'd rather use the
> message-driven paradigm.
>
>
>
>> Also would that
>> mean one or more threads would be constantly busy leading to constant
>> usage
>> of CPU cycles?
>>
>
>
> If you were to switch to accessing the connection directly, you'd
> typically include a small Thread.sleep() to prevent spin-waiting. I
> apologize if the choice of the words "tight loop" implied spin-waiting; I
> just meant that you would keep the sleeps relatively short, not that there
> wouldn't be any at all.
>
>
>
>> 2.
>> For my question on overloaded subscriber I didn't completely follow your
>> suggestion for not being worried about this scenario. You mentioned
>>
>>
>> >  If you're going with a distributed cache, then don't worry about
>> > this, because you'll handle it with queries to the truth store when you
>> > have cache misses (at the cost of slower performance).
>>
>> Assume there are two app servers with an object loaded in the local cache.
>> An update to this object occurs on app server 1 which publishes that
>> object
>> on the jms queue. Here if app server 2 is overloaded (busy CPU), the jms
>> consumer thread might not get a chance to execute at that instance in
>> time.
>> What happens in such cases, does activemq retry after some time?
>
>
>
> In that scenario, your fate is in the hands of the JRE's thread scheduler.
> There's no retrying at the application level; the thread simply sits there
> with its execution pointer set to the next operation to be done, but it
> might take time (milliseconds, not minutes) until the JRE decides that this
> particular thread should be allowed to run its operations.
>
> With that said, if the correct operation of your system depends on the
> cache being updated before the subsequent operation is evaluated (i.e.
> multi-process synchronization), then an asynchronous cache based on
> ActiveMQ is not what you want, and you need to be hitting a
> (non-distributed) database such as a SQL-based RDBMS for all of your
> operations. Distributed systems have a certain amount of unpredictability
> for operations that are run in parallel, so if your use case can't support
> this, then you need an external data store such as an RDBMS to enforce
> ordering/synchronization. I haven't been able to tell from what you've
> written if this is your situation or not, but make sure you're clear on the
> tradeoffs of a distributed/parallel architecture like the one you're
> proposing, and make sure you can accept those tradeoffs.
>
>
>
>> Can the
>> number of such retries be configured? It could so happen that the app
>> server
>> 2 could remain in an overloaded state for a longer duration (may be 30
>> mins).
>>
>
>
> Being "overloaded" doesn't mean your threads won't get to run, unless
> you've done something dumb like setting that thread's priority lower than
> the other threads in the application. In the default configuration, all
> threads will be scheduled more or less evenly, so they'll make progress,
> just not as fast as they would if the box was idle. There's nothing to
> worry about here, unless you can't accept the inherent unpredictability of
> a distributed system (see previous paragraphs).
>
> Tim
>
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

pragmaticjdev
Thanks Tim for your valuable comments.

1.

> so you're using message-driven code rather than managing the connection
> yourself. In that case you'd want to do the following to handle the error
> (by clearing the cache):
> https://stackoverflow.com/questions/40654586/spring-jms-set-errorhandler-for-jmslistener-annotated-method

On your suggestion of registering an ErrorHandler, I wanted to confirm my
understand - Does the subscriber underneath continuously poll activemq and a
failure in doing so gets the ErrorHandler implementation invoked?

2.


Tim Bain wrote
> @JmsListener is typically used with a
> DefaultMessageListenerContainer, which interacts poorly with topics if the
> subscription is non-durable

Are you suggesting we enable durability? Is there an alternative? I ask
because theoretically we don't need durability of the messages. If the
subscriber is down when the message is published to the queue, it is
perfectly ok as the cache is designed to build as requests are received. We
go to the RDBMS store on a cache miss and the subsequent requests then are
handled by the cache.

Thanks!




--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

Tim Bain
On Wed, May 2, 2018, 12:02 AM pragmaticjdev <[hidden email]> wrote:

> Thanks Tim for your valuable comments.
>
> 1.
>
> > so you're using message-driven code rather than managing the connection
> > yourself. In that case you'd want to do the following to handle the
> error
> > (by clearing the cache):
> >
> https://stackoverflow.com/questions/40654586/spring-jms-set-errorhandler-for-jmslistener-annotated-method
>
> On your suggestion of registering an ErrorHandler, I wanted to confirm my
> understand - Does the subscriber underneath continuously poll activemq and
> a
> failure in doing so gets the ErrorHandler implementation invoked?
>


That's my understanding, though I'm not an expert on the Spring code. If
you have more detailed questions about the Spring JMS library specifically,
you might want to either take a look at its source code or post a question
on the Spring forum or on StackOverflow.


2.

>
>
> Tim Bain wrote
> > @JmsListener is typically used with a
> > DefaultMessageListenerContainer, which interacts poorly with topics if
> the
> > subscription is non-durable
>
> Are you suggesting we enable durability? Is there an alternative? I ask
> because theoretically we don't need durability of the messages. If the
> subscriber is down when the message is published to the queue, it is
> perfectly ok as the cache is designed to build as requests are received. We
> go to the RDBMS store on a cache miss and the subsequent requests then are
> handled by the cache.
>


A non-durable subscription may increase the likelihood of those cache
misses, but as long as that's acceptable, it's probably better than using a
durable subscription where you would have to configure a mechanism for
dropping the subscription (and the built-up backlog of unconsumed messages)
after some timeout.

Out of curiosity, why use ActiveMQ to populate your cache at all, rather
than just populating it on each cache miss (which is simpler)? I'm not
saying there aren't valid reasons for doing what you're proposing, but I'm
curious what your reasons are for deciding that this is worth the extra
complication.

Tim

>
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

pragmaticjdev
I think there is some misunderstanding that I created. We don't plan to use
activemq to populate the cache. We build the cache on cache misses. Here's
how the overall flow looks like. Note there are multiple jvm's involved, I
have just pictured one for simplicity.

I didn't completely follow your response to the durability question.
Probably this diagram helps me to explain how the cache behaves in read and
write operations. Please let me know if you still see problems with
@JmsListener(DefaultMessageListenerContainer) and non-durabilty.

<http://activemq.2283324.n4.nabble.com/file/t378744/ActiveMQ_Oracle.png>

Thanks.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Using ActiveMQ For Distributed Replicated Cache

Tim Bain
On Mon, May 7, 2018 at 8:10 AM, pragmaticjdev <[hidden email]> wrote:

> I think there is some misunderstanding that I created. We don't plan to use
> activemq to populate the cache. We build the cache on cache misses. Here's
> how the overall flow looks like. Note there are multiple jvm's involved, I
> have just pictured one for simplicity.
>


Do you truly mean that you don't plan to use ActiveMQ to build the cache?
Or are you just saying that you attempt to build the cache via ActiveMQ
because it's more efficient but that guaranteed delivery isn't required,
because if ActiveMQ misses a message, the cache-miss logic will fill in
whatever gaps might occur? If you're truly not using ActiveMQ to build the
cache, what are you using it for?



> I didn't completely follow your response to the durability question.
> Probably this diagram helps me to explain how the cache behaves in read and
> write operations. Please let me know if you still see problems with
> @JmsListener(DefaultMessageListenerContainer) and non-durabilty.
>
> <http://activemq.2283324.n4.nabble.com/file/t378744/ActiveMQ_Oracle.png>
>
> Thanks.
>


All I meant was that for a non-durable topic subscription, the Spring
JmsTemplate code will disconnect between messages, which results in the
subscription being removed and then re-created the next time the code
connects. If a message is sent when the subscription doesn't exist, the
message will not be delivered when the consumer next creates the
connection, because the subscription did not exist at the time the message
was sent. This is standard topic behavior, but users often don't expect the
JmsTemplate to remove the subscription between calls so it trips up some
users. If you can accept a non-trivial percentage of your updates not
making it into the cache and being filled via cache misses that query the
database, this isn't a problem, but since you've described updating objects
(which will return a stale object from the cache instead of resulting in a
cache miss) you might not be able to live with this behavior. If you can't,
you'll want to use a durable subscription, and you'll need to ensure that
if your consumer disappears for a significant amount of time, your broker
is configured to discard the messages or the subscription (or both).

Tim