Page 1 of 1

Big Problem V3beta6 Stops Pinging

Posted: Sun Apr 29, 2007 7:00 pm
by noakley
There seems to be a fundamental problemn with 3beta6 in that it randomly stops pinging. It appears that everything is ok and working however if you inspect the timeplots for ping response the timeline has stopped. Also devices stay in the state they were at when the dude stopped pinging them. SNMP appears to be ok. I have observed this happening 4 times now but there doesn't appear to be any predictable reason for it, sometimesimes it will run for an hour, sometimes for a couple of days. Restarting the Dude fixes the problem temporarily. I have upgraded from V2 so don't know if the problem exists in previous releases of V3.

Posted: Sun Apr 29, 2007 9:59 pm
by firebat
I have seen a similiar problem since v2 beta 12. Pings fail, sometimes takes hours before they declare the device up again, yet all other probes are fine (SNMP,etc). We have seen this on various different devices (APC devices, Netopia, Trango, Mikrotik RBs, etc). I had to turn off notification for our network until we figure out what to do. Pings from the NMS CLI work packet loss. Only thing I can think of is that Dude is dropping the response messages due to buffering or the like? Maybe it can't handle x number of device responses at one time? I have our pings set to a one minute interval.

How many devices do you have in your network?

I should add that our network is L3 routed and we see the issue on devices in all subnets.

Posted: Mon Apr 30, 2007 8:45 am
by uldis
Can you check that dude is actually not sending any pings to those devices
when this problem occurs? You can use 'ethereal' tool to see that. Thanks, in
advance for providing this information.

Posted: Tue May 01, 2007 12:13 pm
by noakley
Uldis, it stopped again at 02:20am this morning. I checked with Ethereal and there were no pings being issued and no SNMP activity. The GUI was running ok and so was syslog.

Posted: Wed May 02, 2007 5:32 am
by ubb
Just as another datapoint for you guys. I can also confirm that I have seen this problem exactly as noakley describes. No pings, graphs stop working but GUI was running just fine.

Posted: Wed May 02, 2007 9:43 am
by noakley
I certainly will Uldis, it is ok at the moment but will check that it is pinging when it fails again. I don't think this is the same problem that Firebat is observing as I have never seen it before. The problem has only occurred after upgrading from 2.2 to 3beta6. The only other changes I have made after upgrading are adding a map background and adding 6 extra devices. I currently have 101 under management

Posted: Thu May 03, 2007 9:16 pm
by noakley
Any news on this problem Uldis, it is failing twice a day at the moment which makes it somewhat unusable.

Posted: Mon May 07, 2007 3:46 pm
by Ozelo
I was polling every 30s and the ping probe was reporting a wrong chart data, showing high ping times and some times a "down" state triggered. I changed the polling to every 5s and seems that the chart data become close to the rtt we get at prompt. Dunno why I cant choose 1s for polling, but only 5s or bigger. Found it on 3beta6. Thought that changing polling time should change chart data as well, but not that way.


Posted: Wed May 09, 2007 12:58 am
by cramerit
We are seeing the same problem here.

Posted: Wed May 09, 2007 2:11 am
by ubb
Any fixes for this problem coming soon? We've had to stop using The Dude for now - very sad!

Posted: Wed May 09, 2007 12:14 pm
by complete2006
Same problem here. Everything shows green and in real world our biggest customer is down in cause of a crashed MT-Box.

@Uldis: When will the problems be fixed? Please don't tell me "It's beta. Don't use this in production environment..." You have only a chance how good is "The dude" if you use it in real network. I know that it is free of charge (at the moment) but I will pay for a reliable product where I can trust... With every new version I see errors returning we saw some versions before. Can you put your fixes detailed in the changelog?

My problems with B6 are:

1. Stops pinging
2. Send double (ore more) UP-Messages. Today it sends UP-Messages
without any device-state-change.
3. Alerts at begin of scheduled probes (at 6:00) with Down and next poll
UP without any real error.

Problems gone with B6:

Steady false errors


Posted: Fri May 11, 2007 5:27 pm
by noakley
Uldis is there any update on this issue. Is there going to be a fix as I cannot use beta6 to monitor the network. I have seen it stop working 3 times today now

Posted: Fri May 18, 2007 7:11 pm
by ubb
Dude, where's my pings?

Posted: Thu May 24, 2007 12:24 pm
by aviper
Same here,
even when ping the host within dude it says Pingtimeout, meanwhile from cmd console ping is perfect.

Re: Big Problem V3beta6 Stops Pinging

Posted: Fri Jun 01, 2007 5:35 pm
by jerwhite
I think I have a fix to the ping problem in V3beta6!!! Go to your probes by going to settings, discover, and then services. Click on the three little dots to edit the services. Go to ping and change type from random to ICMP and change TTL to 128. Hope this helps. I'm on a network that has 100+ computers spanning 6 LANS and this seems to have fixed my problem. :lol:

Re: Big Problem V3beta6 Stops Pinging

Posted: Sat Jun 02, 2007 2:33 am
by noakley
I wish it were true but sadly no, ping is set to ICMP, TTL 128 and ..... had to restart it 1hr after changing the TTL. I would really like to get a version 3 beta that works ! or even some answer as to when we might get one !

Re: Big Problem V3beta6 Stops Pinging

Posted: Tue Jun 05, 2007 8:48 pm
by complete2006
Anything news in that issue?