Simple algorithm for preventing duplicate processing of items?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple algorithm for preventing duplicate processing of items?

Kevin Burton
I have a database with activemq where I use AMQ to execute items in the
database.

The problem is how do I keep AMQ and my DB in sync so that there’s only
once entry in AMQ for a given database entry.

So I think this is a simple solution to that problem and wanted feedback.

Each logical unit of work is given a work-id which is a nonce.

So if you have a “foo” table in your db record, it would have a work-id.

When you enqueue it you give the message a work id.

The consumer of the AMQ message reads the work ID for the message, and
reads the db table, and if they aren’t identical, it just acks the message,
and returns, doing nothing.

This way the appropriate AMQ message, which is scheduled later, will
process the message only once.

This means:

- if your DB and AMQ become out of sync you can just repair the AMQ
instance from the DB by doing a table scan of your DB and re-enqueueing
everything.

- if you accidentally request something to be processed twice, the earlier
message will be discarded and only the later one will win.

I think this is a simple/elegant way of solving this problem.

thoughts?

--

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Reply | Threaded
Open this post in threaded view
|

Re: Simple algorithm for preventing duplicate processing of items?

Tim Bain
Seems reasonable.  If you want the earliest duplicate message to win
instead of the latest one, you can store a processed flag in the database
instead of storing a nonce, but otherwise it sounds fine.

On Mon, Nov 3, 2014 at 10:46 AM, Kevin Burton <[hidden email]> wrote:

> I have a database with activemq where I use AMQ to execute items in the
> database.
>
> The problem is how do I keep AMQ and my DB in sync so that there’s only
> once entry in AMQ for a given database entry.
>
> So I think this is a simple solution to that problem and wanted feedback.
>
> Each logical unit of work is given a work-id which is a nonce.
>
> So if you have a “foo” table in your db record, it would have a work-id.
>
> When you enqueue it you give the message a work id.
>
> The consumer of the AMQ message reads the work ID for the message, and
> reads the db table, and if they aren’t identical, it just acks the message,
> and returns, doing nothing.
>
> This way the appropriate AMQ message, which is scheduled later, will
> process the message only once.
>
> This means:
>
> - if your DB and AMQ become out of sync you can just repair the AMQ
> instance from the DB by doing a table scan of your DB and re-enqueueing
> everything.
>
> - if you accidentally request something to be processed twice, the earlier
> message will be discarded and only the later one will win.
>
> I think this is a simple/elegant way of solving this problem.
>
> thoughts?
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>