[DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

wei yang
Hi, All

There is something about pageIterator scanning that haunts us over a long
period of time since we use artemis.

The use case is common:

E.g. There is a topic with two queues: q1 and q2. Due to some reasons such
as a bug in the business logic, the clients stop from receiving from q1 and
following messages sent to the topic are not routed to q1 again(the clients
don't want to receive messages until they get back online). After a few
days, the clients are backup starting to consume messages from the queue.
At this point, calling hasNext will scan page files until finding matching
messages(actually no messages matched before this point and dozens of GB
page files are written during business down time). This will lead to some
problems:
1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In our
setup, the process would be terminated.
2. hasNext might be called in queue's executor, as we know, the executor is
shared by all the queues binding to the address, this would cause
starvation on other queues resulting no messages delivered lasting for a
few minutes.

One of the alternative approach i can think is to add some timeout for
hasNext/next. If timeout happens, it will be scheduled later to avoid
problems above mentioned. Does anybody have any opinion on this?

Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

michael.andre.pearce
So the critical check is there to avoid issues where stuff takes too long on the critical path. It was off the back of some major production issues.




I would be hesitant to relax / remove it.




If you know in your broker things take longer you could always configure to increase the critical timeout.




Get Outlook for Android







On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw" <[hidden email]> wrote:










Hi, All

There is something about pageIterator scanning that haunts us over a long
period of time since we use artemis.

The use case is common:

E.g. There is a topic with two queues: q1 and q2. Due to some reasons such
as a bug in the business logic, the clients stop from receiving from q1 and
following messages sent to the topic are not routed to q1 again(the clients
don't want to receive messages until they get back online). After a few
days, the clients are backup starting to consume messages from the queue.
At this point, calling hasNext will scan page files until finding matching
messages(actually no messages matched before this point and dozens of GB
page files are written during business down time). This will lead to some
problems:
1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In our
setup, the process would be terminated.
2. hasNext might be called in queue's executor, as we know, the executor is
shared by all the queues binding to the address, this would cause
starvation on other queues resulting no messages delivered lasting for a
few minutes.

One of the alternative approach i can think is to add some timeout for
hasNext/next. If timeout happens, it will be scheduled later to avoid
problems above mentioned. Does anybody have any opinion on this?

Thanks in advance.





Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

wei yang
Thanks for the reply. I'm not asking to remove critical check. As you said,
we can just increase timeout or log the failure. Our concern is the second
problem with starvation on other queues. It's unacceptable for near
real-time processing of other subscribers to the topic.

<[hidden email]> 于2019年9月18日周三 上午12:58写道:

> So the critical check is there to avoid issues where stuff takes too long
> on the critical path. It was off the back of some major production issues.
>
>
>
>
> I would be hesitant to relax / remove it.
>
>
>
>
> If you know in your broker things take longer you could always configure
> to increase the critical timeout.
>
>
>
>
> Get Outlook for Android
>
>
>
>
>
>
>
> On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw" <[hidden email]> wrote:
>
>
>
>
>
>
>
>
>
>
> Hi, All
>
> There is something about pageIterator scanning that haunts us over a long
> period of time since we use artemis.
>
> The use case is common:
>
> E.g. There is a topic with two queues: q1 and q2. Due to some reasons such
> as a bug in the business logic, the clients stop from receiving from q1 and
> following messages sent to the topic are not routed to q1 again(the clients
> don't want to receive messages until they get back online). After a few
> days, the clients are backup starting to consume messages from the queue.
> At this point, calling hasNext will scan page files until finding matching
> messages(actually no messages matched before this point and dozens of GB
> page files are written during business down time). This will lead to some
> problems:
> 1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In our
> setup, the process would be terminated.
> 2. hasNext might be called in queue's executor, as we know, the executor is
> shared by all the queues binding to the address, this would cause
> starvation on other queues resulting no messages delivered lasting for a
> few minutes.
>
> One of the alternative approach i can think is to add some timeout for
> hasNext/next. If timeout happens, it will be scheduled later to avoid
> problems above mentioned. Does anybody have any opinion on this?
>
> Thanks in advance.
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

andytaylor
If you are dealing with subscribers not being connected for very long
periods of time I would question your choice of using topics in the first
place. Maybe use a different topology, 1 address/queue per consumer for
instance.

On Wed, 18 Sep 2019 at 08:50, yw yw <[hidden email]> wrote:

> Thanks for the reply. I'm not asking to remove critical check. As you said,
> we can just increase timeout or log the failure. Our concern is the second
> problem with starvation on other queues. It's unacceptable for near
> real-time processing of other subscribers to the topic.
>
> <[hidden email]> 于2019年9月18日周三 上午12:58写道:
>
> > So the critical check is there to avoid issues where stuff takes too long
> > on the critical path. It was off the back of some major production
> issues.
> >
> >
> >
> >
> > I would be hesitant to relax / remove it.
> >
> >
> >
> >
> > If you know in your broker things take longer you could always configure
> > to increase the critical timeout.
> >
> >
> >
> >
> > Get Outlook for Android
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw" <[hidden email]>
> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Hi, All
> >
> > There is something about pageIterator scanning that haunts us over a long
> > period of time since we use artemis.
> >
> > The use case is common:
> >
> > E.g. There is a topic with two queues: q1 and q2. Due to some reasons
> such
> > as a bug in the business logic, the clients stop from receiving from q1
> and
> > following messages sent to the topic are not routed to q1 again(the
> clients
> > don't want to receive messages until they get back online). After a few
> > days, the clients are backup starting to consume messages from the queue.
> > At this point, calling hasNext will scan page files until finding
> matching
> > messages(actually no messages matched before this point and dozens of GB
> > page files are written during business down time). This will lead to some
> > problems:
> > 1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In
> our
> > setup, the process would be terminated.
> > 2. hasNext might be called in queue's executor, as we know, the executor
> is
> > shared by all the queues binding to the address, this would cause
> > starvation on other queues resulting no messages delivered lasting for a
> > few minutes.
> >
> > One of the alternative approach i can think is to add some timeout for
> > hasNext/next. If timeout happens, it will be scheduled later to avoid
> > problems above mentioned. Does anybody have any opinion on this?
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

michael.andre.pearce
In reply to this post by wei yang
How about make hasNext return three states not basic true false,




e.g. return and int where 0 and 1 == current true false, but if returns say 2 then its that more pages to read and needs to spin again. This way if a critical issue with hasNext and processing will still be detected as would hang. But you get positive feedback of still more, and then we know theres not a critical issue just simply lots of data to process through




Get Outlook for Android







On Wed, Sep 18, 2019 at 8:50 AM +0100, "yw yw" <[hidden email]> wrote:










Thanks for the reply. I'm not asking to remove critical check. As you said,
we can just increase timeout or log the failure. Our concern is the second
problem with starvation on other queues. It's unacceptable for near
real-time processing of other subscribers to the topic.

 于2019年9月18日周三 上午12:58写道:

> So the critical check is there to avoid issues where stuff takes too long
> on the critical path. It was off the back of some major production issues.
>
>
>
>
> I would be hesitant to relax / remove it.
>
>
>
>
> If you know in your broker things take longer you could always configure
> to increase the critical timeout.
>
>
>
>
> Get Outlook for Android
>
>
>
>
>
>
>
> On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw"  wrote:
>
>
>
>
>
>
>
>
>
>
> Hi, All
>
> There is something about pageIterator scanning that haunts us over a long
> period of time since we use artemis.
>
> The use case is common:
>
> E.g. There is a topic with two queues: q1 and q2. Due to some reasons such
> as a bug in the business logic, the clients stop from receiving from q1 and
> following messages sent to the topic are not routed to q1 again(the clients
> don't want to receive messages until they get back online). After a few
> days, the clients are backup starting to consume messages from the queue.
> At this point, calling hasNext will scan page files until finding matching
> messages(actually no messages matched before this point and dozens of GB
> page files are written during business down time). This will lead to some
> problems:
> 1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In our
> setup, the process would be terminated.
> 2. hasNext might be called in queue's executor, as we know, the executor is
> shared by all the queues binding to the address, this would cause
> starvation on other queues resulting no messages delivered lasting for a
> few minutes.
>
> One of the alternative approach i can think is to add some timeout for
> hasNext/next. If timeout happens, it will be scheduled later to avoid
> problems above mentioned. Does anybody have any opinion on this?
>
> Thanks in advance.
>
>
>
>
>
>





Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

michael.andre.pearce
In reply to this post by andytaylor
That would be same issue if consumers shared a queue and was a consumer side filter, that by consumer would pas lots data. he would hit same issue.




Get Outlook for Android







On Wed, Sep 18, 2019 at 1:36 PM +0100, "Andy Taylor" <[hidden email]> wrote:










If you are dealing with subscribers not being connected for very long
periods of time I would question your choice of using topics in the first
place. Maybe use a different topology, 1 address/queue per consumer for
instance.

On Wed, 18 Sep 2019 at 08:50, yw yw  wrote:

> Thanks for the reply. I'm not asking to remove critical check. As you said,
> we can just increase timeout or log the failure. Our concern is the second
> problem with starvation on other queues. It's unacceptable for near
> real-time processing of other subscribers to the topic.
>
>  于2019年9月18日周三 上午12:58写道:
>
> > So the critical check is there to avoid issues where stuff takes too long
> > on the critical path. It was off the back of some major production
> issues.
> >
> >
> >
> >
> > I would be hesitant to relax / remove it.
> >
> >
> >
> >
> > If you know in your broker things take longer you could always configure
> > to increase the critical timeout.
> >
> >
> >
> >
> > Get Outlook for Android
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw"
> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Hi, All
> >
> > There is something about pageIterator scanning that haunts us over a long
> > period of time since we use artemis.
> >
> > The use case is common:
> >
> > E.g. There is a topic with two queues: q1 and q2. Due to some reasons
> such
> > as a bug in the business logic, the clients stop from receiving from q1
> and
> > following messages sent to the topic are not routed to q1 again(the
> clients
> > don't want to receive messages until they get back online). After a few
> > days, the clients are backup starting to consume messages from the queue.
> > At this point, calling hasNext will scan page files until finding
> matching
> > messages(actually no messages matched before this point and dozens of GB
> > page files are written during business down time). This will lead to some
> > problems:
> > 1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In
> our
> > setup, the process would be terminated.
> > 2. hasNext might be called in queue's executor, as we know, the executor
> is
> > shared by all the queues binding to the address, this would cause
> > starvation on other queues resulting no messages delivered lasting for a
> > few minutes.
> >
> > One of the alternative approach i can think is to add some timeout for
> > hasNext/next. If timeout happens, it will be scheduled later to avoid
> > problems above mentioned. Does anybody have any opinion on this?
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> >
>





Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

andytaylor
Yeah, I was thinking more multiple addresses and don't use selectors. Using
selectors on a topic where there maybe lots of pages messages is always
going to be problematic.

On Wed, 18 Sep 2019, 16:40 , <[hidden email]> wrote:

> That would be same issue if consumers shared a queue and was a consumer
> side filter, that by consumer would pas lots data. he would hit same issue.
>
>
>
>
> Get Outlook for Android
>
>
>
>
>
>
>
> On Wed, Sep 18, 2019 at 1:36 PM +0100, "Andy Taylor" <
> [hidden email]> wrote:
>
>
>
>
>
>
>
>
>
>
> If you are dealing with subscribers not being connected for very long
> periods of time I would question your choice of using topics in the first
> place. Maybe use a different topology, 1 address/queue per consumer for
> instance.
>
> On Wed, 18 Sep 2019 at 08:50, yw yw  wrote:
>
> > Thanks for the reply. I'm not asking to remove critical check. As you
> said,
> > we can just increase timeout or log the failure. Our concern is the
> second
> > problem with starvation on other queues. It's unacceptable for near
> > real-time processing of other subscribers to the topic.
> >
> >  于2019年9月18日周三 上午12:58写道:
> >
> > > So the critical check is there to avoid issues where stuff takes too
> long
> > > on the critical path. It was off the back of some major production
> > issues.
> > >
> > >
> > >
> > >
> > > I would be hesitant to relax / remove it.
> > >
> > >
> > >
> > >
> > > If you know in your broker things take longer you could always
> configure
> > > to increase the critical timeout.
> > >
> > >
> > >
> > >
> > > Get Outlook for Android
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw"
> > wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hi, All
> > >
> > > There is something about pageIterator scanning that haunts us over a
> long
> > > period of time since we use artemis.
> > >
> > > The use case is common:
> > >
> > > E.g. There is a topic with two queues: q1 and q2. Due to some reasons
> > such
> > > as a bug in the business logic, the clients stop from receiving from q1
> > and
> > > following messages sent to the topic are not routed to q1 again(the
> > clients
> > > don't want to receive messages until they get back online). After a few
> > > days, the clients are backup starting to consume messages from the
> queue.
> > > At this point, calling hasNext will scan page files until finding
> > matching
> > > messages(actually no messages matched before this point and dozens of
> GB
> > > page files are written during business down time). This will lead to
> some
> > > problems:
> > > 1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In
> > our
> > > setup, the process would be terminated.
> > > 2. hasNext might be called in queue's executor, as we know, the
> executor
> > is
> > > shared by all the queues binding to the address, this would cause
> > > starvation on other queues resulting no messages delivered lasting for
> a
> > > few minutes.
> > >
> > > One of the alternative approach i can think is to add some timeout for
> > > hasNext/next. If timeout happens, it will be scheduled later to avoid
> > > problems above mentioned. Does anybody have any opinion on this?
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

wei yang
In reply to this post by andytaylor
Thanks, it's indeed a publish-subscribe pattern in our case. There is a
topic to which changing information of products is sent and several
applications are interested in receiving. There is some bug dealing with
product info in one application that needs to be fixed and redeployed,
which is the most common case we've met.

Andy Taylor <[hidden email]> 于2019年9月18日周三 下午8:36写道:

> If you are dealing with subscribers not being connected for very long
> periods of time I would question your choice of using topics in the first
> place. Maybe use a different topology, 1 address/queue per consumer for
> instance.
>
> On Wed, 18 Sep 2019 at 08:50, yw yw <[hidden email]> wrote:
>
> > Thanks for the reply. I'm not asking to remove critical check. As you
> said,
> > we can just increase timeout or log the failure. Our concern is the
> second
> > problem with starvation on other queues. It's unacceptable for near
> > real-time processing of other subscribers to the topic.
> >
> > <[hidden email]> 于2019年9月18日周三 上午12:58写道:
> >
> > > So the critical check is there to avoid issues where stuff takes too
> long
> > > on the critical path. It was off the back of some major production
> > issues.
> > >
> > >
> > >
> > >
> > > I would be hesitant to relax / remove it.
> > >
> > >
> > >
> > >
> > > If you know in your broker things take longer you could always
> configure
> > > to increase the critical timeout.
> > >
> > >
> > >
> > >
> > > Get Outlook for Android
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw" <[hidden email]>
> > wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hi, All
> > >
> > > There is something about pageIterator scanning that haunts us over a
> long
> > > period of time since we use artemis.
> > >
> > > The use case is common:
> > >
> > > E.g. There is a topic with two queues: q1 and q2. Due to some reasons
> > such
> > > as a bug in the business logic, the clients stop from receiving from q1
> > and
> > > following messages sent to the topic are not routed to q1 again(the
> > clients
> > > don't want to receive messages until they get back online). After a few
> > > days, the clients are backup starting to consume messages from the
> queue.
> > > At this point, calling hasNext will scan page files until finding
> > matching
> > > messages(actually no messages matched before this point and dozens of
> GB
> > > page files are written during business down time). This will lead to
> some
> > > problems:
> > > 1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In
> > our
> > > setup, the process would be terminated.
> > > 2. hasNext might be called in queue's executor, as we know, the
> executor
> > is
> > > shared by all the queues binding to the address, this would cause
> > > starvation on other queues resulting no messages delivered lasting for
> a
> > > few minutes.
> > >
> > > One of the alternative approach i can think is to add some timeout for
> > > hasNext/next. If timeout happens, it will be scheduled later to avoid
> > > problems above mentioned. Does anybody have any opinion on this?
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

michael.andre.pearce
In reply to this post by andytaylor
It might be problematic atm. But should it be? Paging shouldnt be a problem, it really should be perf. I think one of the ideas for how we could support retroactive in future is to have an address in always page mode. This means if more than anything we should make sure paging is rock solid. Im not against a change i just want to be sure were able to differentiate for the critical checker, between something thats not returning e.g. locked up due to bug (historic reason its there) vs simply its working as it should just working alot of data... Sent from my Samsung Galaxy smartphone.
-------- Original message --------From: Andy Taylor <[hidden email]> Date: 18/09/2019  19:01  (GMT+00:00) To: [hidden email] Subject: Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched Yeah, I was thinking more multiple addresses and don't use selectors. Usingselectors on a topic where there maybe lots of pages messages is alwaysgoing to be problematic.On Wed, 18 Sep 2019, 16:40 , <[hidden email]> wrote:> That would be same issue if consumers shared a queue and was a consumer> side filter, that by consumer would pas lots data. he would hit same issue.>>>>> Get Outlook for Android>>>>>>>> On Wed, Sep 18, 2019 at 1:36 PM +0100, "Andy Taylor" <> [hidden email]> wrote:>>>>>>>>>>> If you are dealing with subscribers not being connected for very long> periods of time I would question your choice of using topics in the first> place. Maybe use a different topology, 1 address/queue per consumer for> instance.>> On Wed, 18 Sep 2019 at 08:50, yw yw  wrote:>> > Thanks for the reply. I'm not asking to remove critical check. As you> said,> > we can just increase timeout or log the failure. Our concern is the> second> > problem with starvation on other queues. It's unacceptable for near> > real-time processing of other subscribers to the topic.> >> >  于2019年9月18日周三 上午12:58写道:> >> > > So the critical check is there to avoid issues where stuff takes too> long> > > on the critical path. It was off the back of some major production> > issues.> > >> > >> > >> > >> > > I would be hesitant to relax / remove it.> > >> > >> > >> > >> > > If you know in your broker things take longer you could always> configure> > > to increase the critical timeout.> > >> > >> > >> > >> > > Get Outlook for Android> > >> > >> > >> > >> > >> > >> > >> > > On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw"> > wrote:> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > Hi, All> > >> > > There is something about pageIterator scanning that haunts us over a> long> > > period of time since we use artemis.> > >> > > The use case is common:> > >> > > E.g. There is a topic with two queues: q1 and q2. Due to some reasons> > such> > > as a bug in the business logic, the clients stop from receiving from q1> > and> > > following messages sent to the topic are not routed to q1 again(the> > clients> > > don't want to receive messages until they get back online). After a few> > > days, the clients are backup starting to consume messages from the> queue.> > > At this point, calling hasNext will scan page files until finding> > matching> > > messages(actually no messages matched before this point and dozens of> GB> > > page files are written during business down time). This will lead to> some> > > problems:> > > 1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In> > our> > > setup, the process would be terminated.> > > 2. hasNext might be called in queue's executor, as we know, the> executor> > is> > > shared by all the queues binding to the address, this would cause> > > starvation on other queues resulting no messages delivered lasting for> a> > > few minutes.> > >> > > One of the alternative approach i can think is to add some timeout for> > > hasNext/next. If timeout happens, it will be scheduled later to avoid> > > problems above mentioned. Does anybody have any opinion on this?> > >> > > Thanks in advance.> > >> > >> > >> > >> > >> > >> >>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

michael.andre.pearce
In reply to this post by wei yang
This sounds like a reasonable pattern to me. Especially in an event based architecture.As i noted in the other email. My main concern is being able to keep the ability to detect dead or live locks (code issues historically) which is what critical analyser is there for vs its just processing data.. thus my one idea of having hasNext being multi state return not just boolean Sent from my Samsung Galaxy smartphone.
-------- Original message --------From: yw yw <[hidden email]> Date: 19/09/2019  11:00  (GMT+00:00) To: [hidden email] Subject: Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched Thanks, it's indeed a publish-subscribe pattern in our case. There is atopic to which changing information of products is sent and severalapplications are interested in receiving. There is some bug dealing withproduct info in one application that needs to be fixed and redeployed,which is the most common case we've met.Andy Taylor <[hidden email]> 于2019年9月18日周三 下午8:36写道:> If you are dealing with subscribers not being connected for very long> periods of time I would question your choice of using topics in the first> place. Maybe use a different topology, 1 address/queue per consumer for> instance.>> On Wed, 18 Sep 2019 at 08:50, yw yw <[hidden email]> wrote:>> > Thanks for the reply. I'm not asking to remove critical check. As you> said,> > we can just increase timeout or log the failure. Our concern is the> second> > problem with starvation on other queues. It's unacceptable for near> > real-time processing of other subscribers to the topic.> >> > <[hidden email]> 于2019年9月18日周三 上午12:58写道:> >> > > So the critical check is there to avoid issues where stuff takes too> long> > > on the critical path. It was off the back of some major production> > issues.> > >> > >> > >> > >> > > I would be hesitant to relax / remove it.> > >> > >> > >> > >> > > If you know in your broker things take longer you could always> configure> > > to increase the critical timeout.> > >> > >> > >> > >> > > Get Outlook for Android> > >> > >> > >> > >> > >> > >> > >> > > On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw" <[hidden email]>> > wrote:> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > Hi, All> > >> > > There is something about pageIterator scanning that haunts us over a> long> > > period of time since we use artemis.> > >> > > The use case is common:> > >> > > E.g. There is a topic with two queues: q1 and q2. Due to some reasons> > such> > > as a bug in the business logic, the clients stop from receiving from q1> > and> > > following messages sent to the topic are not routed to q1 again(the> > clients> > > don't want to receive messages until they get back online). After a few> > > days, the clients are backup starting to consume messages from the> queue.> > > At this point, calling hasNext will scan page files until finding> > matching> > > messages(actually no messages matched before this point and dozens of> GB> > > page files are written during business down time). This will lead to> some> > > problems:> > > 1. Critical analyzer will be triggered, i.e. CRITICAL_CHECK_DEPAGE. In> > our> > > setup, the process would be terminated.> > > 2. hasNext might be called in queue's executor, as we know, the> executor> > is> > > shared by all the queues binding to the address, this would cause> > > starvation on other queues resulting no messages delivered lasting for> a> > > few minutes.> > >> > > One of the alternative approach i can think is to add some timeout for> > > hasNext/next. If timeout happens, it will be scheduled later to avoid> > > problems above mentioned. Does anybody have any opinion on this?> > >> > > Thanks in advance.> > >> > >> > >> > >> > >> > >> >>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of no messages matched

wei yang
In reply to this post by wei yang
Thanks for suggestions. I will provide a PR soon :)

michael.andre.pearce <[hidden email]> 于2019年9月20日周五
上午8:21写道:

> This sounds like a reasonable pattern to me. Especially in an event based
> architecture.As i noted in the other email. My main concern is being able
> to keep the ability to detect dead or live locks (code issues historically)
> which is what critical analyser is there for vs its just processing data..
> thus my one idea of having hasNext being multi state return not just
> boolean Sent from my Samsung Galaxy smartphone.
> -------- Original message --------From: yw yw <[hidden email]> Date:
> 19/09/2019  11:00  (GMT+00:00) To: [hidden email] Subject: Re:
> [DISCUSS] Artemis pageIterator.hasNext spends too much time in the case of
> no messages matched Thanks, it's indeed a publish-subscribe pattern in our
> case. There is atopic to which changing information of products is sent and
> severalapplications are interested in receiving. There is some bug dealing
> withproduct info in one application that needs to be fixed and
> redeployed,which is the most common case we've met.Andy Taylor <
> [hidden email]> 于2019年9月18日周三 下午8:36写道:> If you are dealing with
> subscribers not being connected for very long> periods of time I would
> question your choice of using topics in the first> place. Maybe use a
> different topology, 1 address/queue per consumer for> instance.>> On Wed,
> 18 Sep 2019 at 08:50, yw yw <[hidden email]> wrote:>> > Thanks for the
> reply. I'm not asking to remove critical check. As you> said,> > we can
> just increase timeout or log the failure. Our concern is the> second> >
> problem with starvation on other queues. It's unacceptable for near> >
> real-time processing of other subscribers to the topic.> >> > <
> [hidden email]> 于2019年9月18日周三 上午12:58写道:> >> > > So
> the critical check is there to avoid issues where stuff takes too> long> >
> > on the critical path. It was off the back of some major production> >
> issues.> > >> > >> > >> > >> > > I would be hesitant to relax / remove it.>
> > >> > >> > >> > >> > > If you know in your broker things take longer you
> could always> configure> > > to increase the critical timeout.> > >> > >> >
> >> > >> > > Get Outlook for Android> > >> > >> > >> > >> > >> > >> > >> > >
> On Tue, Sep 17, 2019 at 11:00 AM +0100, "yw yw" <[hidden email]>> >
> wrote:> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > Hi, All> > >>
> > > There is something about pageIterator scanning that haunts us over a>
> long> > > period of time since we use artemis.> > >> > > The use case is
> common:> > >> > > E.g. There is a topic with two queues: q1 and q2. Due to
> some reasons> > such> > > as a bug in the business logic, the clients stop
> from receiving from q1> > and> > > following messages sent to the topic are
> not routed to q1 again(the> > clients> > > don't want to receive messages
> until they get back online). After a few> > > days, the clients are backup
> starting to consume messages from the> queue.> > > At this point, calling
> hasNext will scan page files until finding> > matching> > >
> messages(actually no messages matched before this point and dozens of> GB>
> > > page files are written during business down time). This will lead to>
> some> > > problems:> > > 1. Critical analyzer will be triggered, i.e.
> CRITICAL_CHECK_DEPAGE. In> > our> > > setup, the process would be
> terminated.> > > 2. hasNext might be called in queue's executor, as we
> know, the> executor> > is> > > shared by all the queues binding to the
> address, this would cause> > > starvation on other queues resulting no
> messages delivered lasting for> a> > > few minutes.> > >> > > One of the
> alternative approach i can think is to add some timeout for> > >
> hasNext/next. If timeout happens, it will be scheduled later to avoid> > >
> problems above mentioned. Does anybody have any opinion on this?> > >> > >
> Thanks in advance.> > >> > >> > >> > >> > >> > >> >>