Community discussions

MikroTik App
 
sathackr
just joined
Topic Author
Posts: 22
Joined: Thu Dec 25, 2014 5:13 am

Uptime rollover bug/SNMP

Mon Nov 16, 2020 10:31 pm

Hello,

About 497 days ago we deployed our first Mikrotik CRS326 switches running RouterOS 6.44.3 into production.

Today they are one-by-one becoming unreachable via SNMP, and when viewing system uptime in the Web UI, it's becoming clear that the uptime counter is being measured in 32bits and has rolled over.

We suspect this is causing SNMP to fail.

Has there been any update in versions >6.44.3 to address this issue? We have over 400 of these switches deployed and do not want to have to track rebooting them every 497 days.
 
joegoldman
Forum Veteran
Forum Veteran
Posts: 775
Joined: Mon May 27, 2013 2:05 am

Re: Uptime rollover bug/SNMP

Mon Nov 16, 2020 11:20 pm

497 days is a long time to go without security upgrades etc.

Perhaps set up a yearly maintenance and upgrade cycle.

Or at the least - have SNMP monitoring start warning at day 450, and become critical at day 480.

Who knows - maybe uptime is 64bit int in newer version of RouterOS - a lot of new versions since your current one.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 13364
Joined: Thu Mar 03, 2016 10:23 pm

Re: Uptime rollover bug/SNMP

Mon Nov 16, 2020 11:50 pm

Linux kernel had 64-bit uptime counter (regardless the HW platform "bitness") since version 2.6 which was released in mid-December 2003.
ROSv7 is built around much newer linux kernel, so the issue will be gone. Not with ROSv6 though, MT is not going to upgrade kernel inside (it's not a trivial task, they stuck to same kernel for too long).

While I tend to agree that some minimum maintenance is right thing to do I don't see that as pressing for a switch where (almost) everything happens inside ASIC / switch chip.
 
sathackr
just joined
Topic Author
Posts: 22
Joined: Thu Dec 25, 2014 5:13 am

Re: Uptime rollover bug/SNMP

Wed Nov 18, 2020 11:54 pm

Yep -- also we are always hesitant to upgrade firmware unless there is a specific issue to address. The risk of firmware upgrade and even just a reboot is not zero. We know that 6.44.3 & 6.44.5 work very well on hundreds of switches and thousands of customers. We're not in a hurry to change it every month when there is a new firmware upgrade and/or potential new firmware regression.

More than a couple of times I've had a MT device fail after a firmware upgrade or simple reboot (corrupt routerboot, corrupt flash, and self-recovery fails and causes and outage and requires subsequent truck roll)

We protect the devices with a robust firewall rule set, and while not perfectly secure, it serves our purposes.

The rollover bug itself isn't necessarily a problem, but SNMP dies somehow in connection with it and makes the devices unmonitorable.
 
troylb
just joined
Posts: 10
Joined: Fri Jan 09, 2015 10:14 pm

Re: Uptime rollover bug/SNMP

Fri Jul 22, 2022 9:11 pm

This whole issue with the uptime and 32bit counters can be resolved without using 64 bit counters. The issue is that timetick are used which rolls over the counters at 497 days. You can resolve this if you setup a counter that is 32bit but seconds instead of timeticks. Most hardware manufactures have this already using the OID: SNMP-FRAMEWORK-MIB::snmpEngineTime.0 which can be used as an alternative to the timeticks counter. This can't be used on a linux/unix machine because that daemon can restart and the time would change, but this does not restart in router/switch gear, printers and other hardware.

Currently FSCOM, Cisco, Fortinet, peplink, Axis, UBNT Edge switches and Zyxel switches are all known to support this option. I don't believe that it would be difficult to implement this on the mikrotik as it would just be another 32bit INT value.

This would give you uptimes that would far exceed the life of the hardware.

Is there anyone at Mikrotik that mike be able to comment about this? The issue of security updates for hardware that is on private network space is completely moot point as they are not reachable from the outside and these can run stable for years without issue.

Anyway, that is my two cents on this.

Best,


-Troy
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 13364
Joined: Thu Mar 03, 2016 10:23 pm

Re: Uptime rollover bug/SNMP

Fri Jul 22, 2022 11:23 pm

The issue was resolved in 32-bit linux kernel long time ago (and was never a real issue in 64-bit linux kernel). Since v7 uses fairly recent linux kernel, uptime rollover won't be an issue for much longer. It'll only affect devices whose administrators don't care to upgrade running software, but I guess most of those admins don't care about uptime too much either.

BTW I suspect that workaround for SNMP actually relies on kernel having correct uptine info and only maps value into sustainable range. Which makes it pretty impractical to implement in ROS v6 (even if devs cared about such a minor nuissnce) but much easier to implement in v7.
 
troylb
just joined
Posts: 10
Joined: Fri Jan 09, 2015 10:14 pm

Re: Uptime rollover bug/SNMP

Fri Apr 28, 2023 6:39 pm

Hello,

We created a solution that works for us as a workaround to the counters being 32bit. This involved using a script and while I would not recommend making this available to devices where SNMP has not been restricted, the script is below. This will produce an uptime that is accurate to 5 seconds. This generates a number that is in seconds, so a basic snmp script to convert it to years, months, weeks, days, hours, minutes and seconds can be easily written and report and/or alert on uptimes. The example is from one of our RB3011 units, but this works on all units that are running at least 6.34 routerOS and switchOS. We have this running on CRS units, CCR units and RB units.

Hope this is useful. It certainly has been for us.

The SNMP OID is 1.3.6.1.4.1.14988.1.1.18.1.1.2.2

# model = RouterBOARD 3011UiAS
# serial number = 8EEB08F6B003
/system script
add dont-require-permissions=no name=seconds owner=admin policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source=\
":global a (\$a+5); :put \$a;"
add dont-require-permissions=no name=uptime_data owner=admin policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source=\
":global a; :put \$a;"

/system scheduler
add comment="Runs the seconds script that updates the variable \"a\" by 5 seconds" interval=5s name=Uptime_run on-event=seconds policy=\
ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-time=startup