Page 1 of 1

RB4011 carrying traffic but access is lost

Posted: Mon Sep 21, 2020 9:30 pm
by Brough
We have a problem that we've seen only with RB4011s. The router becomes inaccessible meaning it no longer responds to Winbox, SSH or SNMP, however it continues to carry traffic and there appear to be no customer problems. Power cycling the router restores normal operation.

Our network has ~800 RouterOS devices (CCR1036, CCR1009, RBT4011, RB2011, RB750UP, RB1100, etc.). Most of these are running RouterOS 6.44.6. We have about 75 RB4011s deployed and we're adding 3-4 per week. We've never had this type of problem on any router before but it's come up three times on 3 different RB4011s in the past 6 weeks.

Has anyone else seen something like this?

Any suggestions on how to debug this the next time we see it?

Re: RB4011 carrying traffic but access is lost

Posted: Mon Sep 21, 2020 10:33 pm
by xvo
If I recall correctly, the were some messages describing similar behaviour on 4011: one cpu core maxes out, cutting out access to the device itself.
Try searching the forum for last couple of month.

Re: RB4011 carrying traffic but access is lost

Posted: Mon Sep 21, 2020 11:41 pm
by Brough
Thanks. I don't think that's what's happening here.

We have SNMP data every two minutes and we record CPU utilization for each core. For the most recent loss of access, the utilization values were 0, 1 or 2 % on each core, randomly distributed, right up to the moment when we stopped getting SNMP data.

Re: RB4011 carrying traffic but access is lost

Posted: Mon Sep 21, 2020 11:52 pm
by Brough
I finally found the other thread. It's here: viewtopic.php?f=2&t=149062
And yes, our problem is similar to at least one thing reported there.

I'll go through that thread in more detail...
Thanks again!

Re: RB4011 carrying traffic but access is lost

Posted: Mon Sep 21, 2020 11:56 pm
by xvo
Yes, that sounds different indeed.
Still worth "digging" the forum.