Community discussions

MikroTik App
 
seriousman
just joined
Topic Author
Posts: 4
Joined: Thu Jun 03, 2010 8:11 pm

Notifications -order of events

Mon Jun 07, 2010 7:25 pm

I'm new to the dude and am hoping somebody has already set up what I'm trying to achieve.
I have a bunch of UPS's. I have created my own Probe which pulls an OID value, when the OID value is 1, the UPS is on building power. When the OID value is 2, the UPS is on battery power. I seem to have the email notifications set and they are working fine (set to alert when OID does not equal 2, which is a little backwards, but I couldn't get it working any other way). My problem is the network card in the UPS's is unstable. Often the network card will become unreachable - will not respond to Ping or SNMP. When this happens, the OID does not return any value, so the dude interprets that as a value which doesn't equal 2 and therefor sends an alert.
If I could set up the notifications so that if OID value = null, do nothing. If oid value = 1, send alert BUILDING_POWER. If OID value = 2, send alert BATTERY_POWER.
Alternatively it could be "if device does not respond to ping, do not send alerts. Only send alerts 1 or 2 if ping received."

Dude = v3.6
Probe:
Type: SNMP,
Agent: DEFAULT,
OID: 1.3.6.1.4.1.476.1.42.3.5.3.10.0,
OID Type: Integer,
Compare Method: !=(equal),
Integer Value: 2


Any Help would be most appreciated.
Thanks
 
User avatar
matrot2
Frequent Visitor
Frequent Visitor
Posts: 79
Joined: Thu Jun 25, 2009 12:34 pm

Re: Notifications -order of events

Wed Jun 09, 2010 12:06 am

For mikrotik have special devices UPS-MT-monitor - http://tandem.ck.ua/ups_mtm-eng.php
They work correctly with the UPS-MT (http://tandem.ck.ua/ups_mt-eng.php), which can be collected by hand.
Very reliable and proven alternative.
 
seriousman
just joined
Topic Author
Posts: 4
Joined: Thu Jun 03, 2010 8:11 pm

Re: Notifications -order of events

Wed Jun 09, 2010 2:24 pm

Thank you for your response Matrot2.
Unfortunately I have over 200 UPS's I wish to monitor and cannot justify purchasing additional hardware. Also, all my UPS's are connected by ethernet directly to a switch port and are being monitored remotely over the network.
What I need is a way to find a reliable means of ordering the probes/notifications or a way of prioritizing them. I read Lebowski's post on Probes and it was very insightful, but being new to the Dude, I was unable to follow all of it.

Any help from anyone is most appreciated.
Thank you
 
lebowski
Forum Guru
Forum Guru
Posts: 1619
Joined: Wed Aug 27, 2008 5:17 pm

Re: Notifications -order of events

Wed Jun 09, 2010 5:49 pm

Yeah it does take some time to get the hang of it, I have APC UPSs and have made quite a few probes for them.

What you want to do with battery failed should be doable, I don't have time right now so I will have to get back to you but you should try to determine why your ups network card stops responding... I would not build a probe to try to work around a faulty device. I would fix the device.

Ok have to jet
 
User avatar
gsandul
Member Candidate
Member Candidate
Posts: 154
Joined: Mon Oct 19, 2009 1:42 pm

Re: Notifications -order of events

Thu Jun 10, 2010 2:04 pm

Hello seriousman
Just let me clarify. An alert can be sent when service state is changed.
alert.JPG
The service may change state from UP to Down and from Down to UP, there is no other condition.
It is not hard to make the probe, but, is it realy good to receive notification that service is in UP state when UPS changed from ON_BTARRY to down (no ping responce)?
You do not have the required permissions to view the files attached to this post.
 
lebowski
Forum Guru
Forum Guru
Posts: 1619
Joined: Wed Aug 27, 2008 5:17 pm

Re: Notifications -order of events

Thu Jun 10, 2010 11:46 pm

Here is a NEW battery failed probe for APC UPSs. Kudos go to gsandul for the function and probe template. It has helped me a whole bunch.

Your request... "if OID value = null, do nothing. If oid value = 1, send alert BUILDING_POWER. If OID value = 2, send alert BATTERY_POWER."
I built this probe to alert on "False" and when value does not equal 1 since when value=1 everything is fine... I have not built a probe that handles another state like you requested (sorry) and I have spend a lot of time on this already.

Lets start with a function, (right click functions, open separately)
Create a new function, (click + on the function window)
Give the function a name which can be called from a probe...

Name: upsbat
Description: Returns false or the value in 1.3.6.1.4.1.318.1.1.1.2.2.4.0
Code: if(array_size(oid_column("1.3.6.1.4.1.318.1.1.1.2.2.4" ,10 ,29)), oid_raw("1.3.6.1.4.1.318.1.1.1.2.2.4.0", 10, 29) ,"False")

The above function will grab all the values in the oid_column, find the specific one that has the value that we are interested in. If it is non zero it will return the value in the oid_raw form. If it is zero it will return false. Now we should be able to do all the rest of the work on the error line of the probe.

Notice I am using oid_raw in the function. APC UPS returns a text string "noBatteryNeedsRelplacing(1)" when the battery is fine. We need the value "1" for the if statement to work. I used oid_raw since it seems to only return numerical values.

Now it is time to create a probe. Same as above, click probes, click plus...
Notice below on the available line I am calling the function "upsbat()<> "False"" to see if there is a value or if it is "False".
The great thing about this function/probe combo is the notification error message.
The notification has both "UPS on battery" if the value is not equal to 1 and "Cant read UPS" if the OID is not available, otherwise the probe is up and the device up.

Name: ups_bat_test
Type: Function
Agent: default
Available: upsbat()<> "False"
Error: if(upsbat()<>"False", if(upsbat() = 1, "", "Warning: UPS Battery failed"), "Warning cant read Battery")

Note: I didn't put a value to graph. There is no reason to graph the value 1. ALTHOUGH you will have to go manually turn off history (individually (on the history tab) on each device that you put the probe on) if you don't want to see a graph at all. The default is to graph and not putting a value there gives you a graph of 0. Also keep in mind that values are not graphed if the Error line of the probe is "true". The probe "Error" line goes true if anything other than "" is returned.

Also note the function of this oid on an APC UPS is for if the battery fails and not for if the ups is on building power like your ups. [edit: I added the correct message to the error line for APC since the rest of the probe is correct for APC.]

I will build all my new probes based on the above model since it can tell the difference between when the value is not available and the value is within an error threshold... It is also much more predictable when using auto discover.
batfunc.PNG
batprobe.PNG
And as gsandul pointed out you can really customize the notification to your hearts content. I am just using the default notification for this in the global settings. You won't need the ping probe since you can tell if this probe is down or if the ups is on battery. You might have 4 or 5 more probes on your UPS if you like, one for battery capacity, temperature, voltage in... these are good to graph :)

HTH,
Lebowski

[EDIT Awe crap :) The "Error" line was completely wrong. I wrote the post before/as I built the probe and took screen shots of the working probe but failed to finish proof reading before I had to leave.]
You do not have the required permissions to view the files attached to this post.
 
User avatar
gsandul
Member Candidate
Member Candidate
Posts: 154
Joined: Mon Oct 19, 2009 1:42 pm

Re: Notifications -order of events

Fri Jun 11, 2010 8:53 am

Hi all.
I may add to this
Your request... "if OID value = null, do nothing. If oid value = 1, send alert BUILDING_POWER. If OID value = 2, send alert BATTERY_POWER." I have not built a probe that handles another state like you requested (sorry)
The request is useless.
And must say, lebowski solution is the best.

in any case, I would follow lebowsky proposal
I would not build a probe to try to work around a faulty device. I would fix the device.
 
seriousman
just joined
Topic Author
Posts: 4
Joined: Thu Jun 03, 2010 8:11 pm

Re: Notifications -order of events

Fri Jun 11, 2010 2:32 pm

Wow. That is fantastic!
Thank you both!
I will build it up and post the results as soon as I can.
I completely agree about fixing the device rather than building the probe, but there are plenty of logistical problems around that. I am in the process of replacing the UPS's as they fail, but what I need in the mean time is accurate reporting about their status until such time they can be replaced. The great thing about this probe is that it should work regardless if the UPS is faulty or not.
Thank you so much for all your work

Regards,
jk
 
lebowski
Forum Guru
Forum Guru
Posts: 1619
Joined: Wed Aug 27, 2008 5:17 pm

Re: Notifications -order of events

Fri Jun 11, 2010 5:50 pm

I took the probe from this thread and made a wiki entry. http://wiki.mikrotik.com/wiki/Quick_gui ... good_probe This is for users who have an idea how probes and functions work and just need a template to start from.
 
seriousman
just joined
Topic Author
Posts: 4
Joined: Thu Jun 03, 2010 8:11 pm

Re: Notifications -order of events

Tue Sep 21, 2010 3:22 pm

It's been a while since I had a chance to get back to this.
Unfortunately I could not get my probe to function correctly. I will keep at it as I trust it should work. I am curious about one thing in lebowski's post. What does the 10 and 29 represent in the following line?
Code: if(array_size(oid_column("1.3.6.1.4.1.318.1.1.1.2.2.4" ,10 ,29)), oid_raw("1.3.6.1.4.1.318.1.1.1.2.2.4.0", 10, 29) ,"False")
I wonder if that's where my problem lies. Any more help is greatly appreciated.
Thank you!
 
lebowski
Forum Guru
Forum Guru
Posts: 1619
Joined: Wed Aug 27, 2008 5:17 pm

Re: Notifications -order of events

Tue Sep 21, 2010 4:57 pm

The 10 is the cache time and 29 is negative cache time. The default for the dude is a 5 second cache time and a 300 second negative cache time.

When a probe successfully grabs a value it will remember that value for 10 seconds instead of the default 5. When a probe fails to grab a value it will remember that it could not for 29 seconds. This "fix" is so that if a probe fails it will not stay down for 300 seconds.

Ever have a probe down and no amount of clicking reprobe would fix it? but in a few minutes it would come back up all by it self?