Sun Apr 06, 2008 1:53 am
It is not that the Dude is complicated, any NMS will be fairly complicated to properly configure and use if you lack knowledge and experience. Before implementing any software that monitors service availability you should map it all out on paper (or a spreadsheet) so that you can determine what is critical to your organizations operations. Once you have everything ranked as to its importance, you then need to determine what your monitoring state will be re: polling intervals. Once this is done, you can set your notifications so that the important things will let you know quickly, and objects of lesser importance will still notify you, but maybe not at 3AM.
Look at it this way - The most critical components of this imaginary network are the switches that connect to the outside world, the internal VLAN's from each switch, mail servers, file and client web servers that are critical to production, and large network printers and plotters. These objects have the most frequent polling rate (say 30 seconds), with a low number for timeout (10 seconds) and a low countdown (also 3, never one as it will generate too many false positives). This is so I will get the notification as quickly as possible when something goes wrong and hopefully before the users notice anything.
At the next level, I poll devices at about every two minutes, with a timeout of 30 seconds, and a countdown of 5. These devices may be important, but not critical to everyday tasks, so they can be down for a couple of mintues before I need to be aware of them, and I may not need notifcation after hours or on weekends.
Finally I have the non-critical objects that just need to be monitored, like UPS's, that I record information from, and monitor to ensure availbility when we do need them. These will poll at the highest interval, can afford long timeouts and high countdowns. Since I am monitoring these objects use when primary systems fail and to ensure that are ready to step in, my notification level for this is something that can be in my inbox when I show up the office.
Now, this was just an example, and finer tuning can be performed - but you need to really understand the nature of the business you are supporting, what is it that keeps people productive, and what the impact is when one of those systems falls to an event.
As to buying something "more preconfigured", you would still face the same learning curve to have it fully operational. In the case of most commercial products you may even have to learn a proprietary language to customize the tool and you will spend a great deal of money to have something that may or may not do what the Dude does for free.
Of course, that's just my opinion, based on experience with several OTF products and a few years in the IT game.