Quantcast
Channel: THWACK: Document List - Network Performance Monitor
Viewing all articles
Browse latest Browse all 1956

..Alert > Reset > [Custom Property Delay] > Alert > Reset..

$
0
0

Hi All,

 

just wanted to share some alerting logic with you 

 

firstly the problem is that when an alert triggers it will eventually reset, my issue is that it could reset straight away and before you know it another trigger, ramping up the ticket count, incorrectly logged tickets and missed alerts. solarwinds ships with some cool features to help prevent alerts with some advance logic but this will require making a few dozen additional alerts,

 

for the purpose of this article ill just use node status as the alert condition

 

just below, the scenario is noisy alerts. the first trigger is at 9 then reset, then an alert at 11 then reset, then alert an alert at 2 and so on... you could simply say in the alert reset condition node must be UP for x minutes or hours before resetting then re-triggering some time later.

 

My issue is that i don't really like this static value and to change this i would need to create another alert with different times, opting for this type of alerting will require making duplicates of interfaces alerts, volume alerts,node status basically everything, at this point things start to get messy specially when you implement your workflow else where, if like me i just want 1 single alert.to the solution ill try to keep high level as possible

 

Just to recap i want an alert to trigger then reset then custom property delay then alert before the next trigger, seems reasonable enough. the logic for the alert is simple really, just check events from previous alerts and if the current time is greater then the custom property of the last trigger then alert.

 

your need to create a custom property.

 

NameFormatDescription
n_hrs_DelayNxtTrigrStatus

integer

number of hours to delay node status alert after 1st trigger

 

SQL condition:

 

--comment out the select when pasting in the alert.
SELECT Nodes.Caption, Nodes.NodeID FROM Nodes
--here you are joining on Events table and checking for events for the RESET syntax generate from the NPMeventlog
WITH(NOLOCK)LEFT JOIN Events p WITH(NOLOCK) ON p.NetObjectID = Nodes.NodeID AND p.NetObjectType = 'N' AND (p.Message like '%Rule: "- Node Status" | Reset:%')
WHERE
 nodes.Status = 2 AND nodes.n_mute <> '1'   group by Nodes.Caption, Nodes.NodeID
HAVING  --check last alert trigger from events compare against custom property value IF null then set hardcoded 16 hours delay
 (       (   
 DATEDIFF(HOUR, MAX(p.EventTime), getdate()) > ISNULL(MAX(nodes.n_hrs_DelayNxtTrigrStatus),MAX(16))       )
 )
OR
 --no record has been logged for the above 'Message'    (MAX(p.eventtime) IS NULL)

 

due to this type of logic for the trigger if the device is down for longer 16 hours then it will reset regardless of node status because the 16 hours have passed resetting the condition. so what i use for node status resets alerts is to check for the actual node status, like this:

how to generate a NPM event this is what you are looking for in the message field.(p.Message like)

that's pretty much it, now once you set a value in the custom property 'n_hrs_delayNetTrigrStatus' on the node details it will check the events in future before triggering just like below.

bare in mind that this article is just showing you node status you might have 15 alerts so you will need 15 custom properties one for each alert with the above logic checking for the last NPMevent generated by that alert.

 

i would like to thank alexslv for his report Repetitive Email Alerts (Noise) - MUST HAVE REPORT!!! because this was for me the next logical thing when finding noise lol

 

please let me know if you have any issues/questions i had to trim allot of the alert i use but runs as expected.

 

thanks,

 

Dan


Viewing all articles
Browse latest Browse all 1956

Trending Articles