[DISCUSS] Critical Analysis feature on broker

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[DISCUSS] Critical Analysis feature on broker

clebertsuconic
I'm adding some logic to detect cases where the broker may become irresponsive.

I'm adding a component called CriticalAnalyzer, which will inspect
response times of certain operations and decide to take the broker
down when bad things are happening.


Along several critical operations on the broker, I'm adding this pattern:


enterCritical(pathID);
try {
   synchronized (lock) {
   }
} finally {
   leaveCritical(pathID);
}

The CriticalAnalyzer will look at the times between enter and leave,
and with a configured timeout, it will take the broker down.



Now, when it's coming to the configuration, I'm not finding a good
nomenclature for this.. and I'm asking for help:

So, far I came up with these names:

- analyze-critical : default true
  is the critical analyzer on?

- analyze-critical-timeout: default 120000 (milliseconds, 2 minutes)
  The timeout used to

- analyze-critical-check-period default 1/2 of analyze-critical-timeout

- analyze-critical-halt-on-failure: default false
  In case of an issue, the a Runtime.halt() would be issued if true,
  otherwise a shutdown.

During deadlocks or IO issues, the most effective way would be
actually the halt. We could even change the start scripts to restart
the server in case of a returned value.




Any input?


I will send a Pull Request soon.


--
Clebert Suconic
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] Critical Analysis feature on broker

clebertsuconic
PR Sent.. i would appreciate reviews.

thanks

On Fri, Aug 4, 2017 at 1:02 PM, Clebert Suconic
<[hidden email]> wrote:

> I'm adding some logic to detect cases where the broker may become irresponsive.
>
> I'm adding a component called CriticalAnalyzer, which will inspect
> response times of certain operations and decide to take the broker
> down when bad things are happening.
>
>
> Along several critical operations on the broker, I'm adding this pattern:
>
>
> enterCritical(pathID);
> try {
>    synchronized (lock) {
>    }
> } finally {
>    leaveCritical(pathID);
> }
>
> The CriticalAnalyzer will look at the times between enter and leave,
> and with a configured timeout, it will take the broker down.
>
>
>
> Now, when it's coming to the configuration, I'm not finding a good
> nomenclature for this.. and I'm asking for help:
>
> So, far I came up with these names:
>
> - analyze-critical : default true
>   is the critical analyzer on?
>
> - analyze-critical-timeout: default 120000 (milliseconds, 2 minutes)
>   The timeout used to
>
> - analyze-critical-check-period default 1/2 of analyze-critical-timeout
>
> - analyze-critical-halt-on-failure: default false
>   In case of an issue, the a Runtime.halt() would be issued if true,
>   otherwise a shutdown.
>
> During deadlocks or IO issues, the most effective way would be
> actually the halt. We could even change the start scripts to restart
> the server in case of a returned value.
>
>
>
>
> Any input?
>
>
> I will send a Pull Request soon.
>
>
> --
> Clebert Suconic



--
Clebert Suconic
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] Critical Analysis feature on broker

MichaelAndrePearce
Thanks Clebert have left my feedback directly on the PR.

Cheers
Mike

Sent from my iPhone

> On 5 Aug 2017, at 06:03, Clebert Suconic <[hidden email]> wrote:
>
> PR Sent.. i would appreciate reviews.
>
> thanks
>
> On Fri, Aug 4, 2017 at 1:02 PM, Clebert Suconic
> <[hidden email]> wrote:
>> I'm adding some logic to detect cases where the broker may become irresponsive.
>>
>> I'm adding a component called CriticalAnalyzer, which will inspect
>> response times of certain operations and decide to take the broker
>> down when bad things are happening.
>>
>>
>> Along several critical operations on the broker, I'm adding this pattern:
>>
>>
>> enterCritical(pathID);
>> try {
>>   synchronized (lock) {
>>   }
>> } finally {
>>   leaveCritical(pathID);
>> }
>>
>> The CriticalAnalyzer will look at the times between enter and leave,
>> and with a configured timeout, it will take the broker down.
>>
>>
>>
>> Now, when it's coming to the configuration, I'm not finding a good
>> nomenclature for this.. and I'm asking for help:
>>
>> So, far I came up with these names:
>>
>> - analyze-critical : default true
>>  is the critical analyzer on?
>>
>> - analyze-critical-timeout: default 120000 (milliseconds, 2 minutes)
>>  The timeout used to
>>
>> - analyze-critical-check-period default 1/2 of analyze-critical-timeout
>>
>> - analyze-critical-halt-on-failure: default false
>>  In case of an issue, the a Runtime.halt() would be issued if true,
>>  otherwise a shutdown.
>>
>> During deadlocks or IO issues, the most effective way would be
>> actually the halt. We could even change the start scripts to restart
>> the server in case of a returned value.
>>
>>
>>
>>
>> Any input?
>>
>>
>> I will send a Pull Request soon.
>>
>>
>> --
>> Clebert Suconic
>
>
>
> --
> Clebert Suconic
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] Critical Analysis feature on broker

gtully
nice, I think there is value in just logging this information and not
halting of stopping.
In this way the feature can be used to determine usage patterns and spikes
etc and it would be possible to determine what the critical levels are.
This would allow a separation between getting information and doing
something about it.

On Sun, 6 Aug 2017 at 05:58 Michael André Pearce <
[hidden email]> wrote:

> Thanks Clebert have left my feedback directly on the PR.
>
> Cheers
> Mike
>
> Sent from my iPhone
>
> > On 5 Aug 2017, at 06:03, Clebert Suconic <[hidden email]>
> wrote:
> >
> > PR Sent.. i would appreciate reviews.
> >
> > thanks
> >
> > On Fri, Aug 4, 2017 at 1:02 PM, Clebert Suconic
> > <[hidden email]> wrote:
> >> I'm adding some logic to detect cases where the broker may become
> irresponsive.
> >>
> >> I'm adding a component called CriticalAnalyzer, which will inspect
> >> response times of certain operations and decide to take the broker
> >> down when bad things are happening.
> >>
> >>
> >> Along several critical operations on the broker, I'm adding this
> pattern:
> >>
> >>
> >> enterCritical(pathID);
> >> try {
> >>   synchronized (lock) {
> >>   }
> >> } finally {
> >>   leaveCritical(pathID);
> >> }
> >>
> >> The CriticalAnalyzer will look at the times between enter and leave,
> >> and with a configured timeout, it will take the broker down.
> >>
> >>
> >>
> >> Now, when it's coming to the configuration, I'm not finding a good
> >> nomenclature for this.. and I'm asking for help:
> >>
> >> So, far I came up with these names:
> >>
> >> - analyze-critical : default true
> >>  is the critical analyzer on?
> >>
> >> - analyze-critical-timeout: default 120000 (milliseconds, 2 minutes)
> >>  The timeout used to
> >>
> >> - analyze-critical-check-period default 1/2 of analyze-critical-timeout
> >>
> >> - analyze-critical-halt-on-failure: default false
> >>  In case of an issue, the a Runtime.halt() would be issued if true,
> >>  otherwise a shutdown.
> >>
> >> During deadlocks or IO issues, the most effective way would be
> >> actually the halt. We could even change the start scripts to restart
> >> the server in case of a returned value.
> >>
> >>
> >>
> >>
> >> Any input?
> >>
> >>
> >> I will send a Pull Request soon.
> >>
> >>
> >> --
> >> Clebert Suconic
> >
> >
> >
> > --
> > Clebert Suconic
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] Critical Analysis feature on broker

clebertsuconic
Funny/coincidence you mentioned that..


I'm looking at a production case right now, for a customer of hornetQ
where network is slow (some switch broken.. I don't know exactly..
that's part of the investigation)... they need to find when that is
happening and colocated with other logs.



On Thu, Aug 10, 2017 at 8:53 AM, Gary Tully <[hidden email]> wrote:

> nice, I think there is value in just logging this information and not
> halting of stopping.
> In this way the feature can be used to determine usage patterns and spikes
> etc and it would be possible to determine what the critical levels are.
> This would allow a separation between getting information and doing
> something about it.
>
> On Sun, 6 Aug 2017 at 05:58 Michael André Pearce <
> [hidden email]> wrote:
>
>> Thanks Clebert have left my feedback directly on the PR.
>>
>> Cheers
>> Mike
>>
>> Sent from my iPhone
>>
>> > On 5 Aug 2017, at 06:03, Clebert Suconic <[hidden email]>
>> wrote:
>> >
>> > PR Sent.. i would appreciate reviews.
>> >
>> > thanks
>> >
>> > On Fri, Aug 4, 2017 at 1:02 PM, Clebert Suconic
>> > <[hidden email]> wrote:
>> >> I'm adding some logic to detect cases where the broker may become
>> irresponsive.
>> >>
>> >> I'm adding a component called CriticalAnalyzer, which will inspect
>> >> response times of certain operations and decide to take the broker
>> >> down when bad things are happening.
>> >>
>> >>
>> >> Along several critical operations on the broker, I'm adding this
>> pattern:
>> >>
>> >>
>> >> enterCritical(pathID);
>> >> try {
>> >>   synchronized (lock) {
>> >>   }
>> >> } finally {
>> >>   leaveCritical(pathID);
>> >> }
>> >>
>> >> The CriticalAnalyzer will look at the times between enter and leave,
>> >> and with a configured timeout, it will take the broker down.
>> >>
>> >>
>> >>
>> >> Now, when it's coming to the configuration, I'm not finding a good
>> >> nomenclature for this.. and I'm asking for help:
>> >>
>> >> So, far I came up with these names:
>> >>
>> >> - analyze-critical : default true
>> >>  is the critical analyzer on?
>> >>
>> >> - analyze-critical-timeout: default 120000 (milliseconds, 2 minutes)
>> >>  The timeout used to
>> >>
>> >> - analyze-critical-check-period default 1/2 of analyze-critical-timeout
>> >>
>> >> - analyze-critical-halt-on-failure: default false
>> >>  In case of an issue, the a Runtime.halt() would be issued if true,
>> >>  otherwise a shutdown.
>> >>
>> >> During deadlocks or IO issues, the most effective way would be
>> >> actually the halt. We could even change the start scripts to restart
>> >> the server in case of a returned value.
>> >>
>> >>
>> >>
>> >>
>> >> Any input?
>> >>
>> >>
>> >> I will send a Pull Request soon.
>> >>
>> >>
>> >> --
>> >> Clebert Suconic
>> >
>> >
>> >
>> > --
>> > Clebert Suconic
>>



--
Clebert Suconic
Loading...