Page 1 of 1

CCR2004-16G-2S+PC ports "flapping" on v7.15.3

Posted: Thu Nov 14, 2024 7:52 pm
by AlekEagle
Howdy all,
Recently one of our routers that we have deployed about 6 months ago to one of our towers began exhibiting some very weird behavior, all of the ports on it would start disconnecting and reconnecting to downstream hardware in a repeating pattern. They would start connected for about 10 seconds, disconnect for about 5 seconds, and reconnect starting the loop over again. As far as I am aware it only affected the 16 ethernet ports, we are uncertain about whether it affected the SFP+ ports as we didn't have anything connected to them. the router had been working fine until about a week ago when it first started behaving like this.

We know it is the router because we have already eliminated all other possibilities (checked cables, PoE injectors, our Cambium radios, etc.) and it would not quit until a hard power cycle (a safe shutdown was not possible because we were unable to establish a winbox session with it, and we have no serial console cable). Once it rebooted the issue went away for about a week, when it happened again.

We have since pulled it from the field and replaced it with a full sized CCR2004-16G-2S+ with active cooling, and are stress testing the router we removed from service by running the iperf utility through multiple devices to try and simulate active traffic. We are fairly certain that it was not an environmental issue despite our cold weather (we operate in interior Alaska) the boxes that housed the routers are dry and sealed from the weather, and we have never had any issues regarding the temperature or humidity from any of our other routers or previous models.

We are pretty confident that it might be a hardware issue, but also open to any suggestions/insights on what it could possibly be.

Re: CCR2004-16G-2S+PC ports "flapping" on v7.15.3

Posted: Thu Nov 14, 2024 8:15 pm
by holvoetn
Do the log files show anything useful when it happens ?
Mac conflict somewhere?

Re: CCR2004-16G-2S+PC ports "flapping" on v7.15.3

Posted: Thu Nov 14, 2024 8:50 pm
by AlekEagle
Do the log files show anything useful when it happens ?
Mac conflict somewhere?
I'm not sure, our logs are only in volatile memory, and the only way to remedy the problem was to physically disconnect it from power and reboot it that way. The only time I was able to see logs before it became impossible to connect via Winbox the only logs were of the port links going up and down. Our devices connected to it never reported any MAC conflicts though.

Re: CCR2004-16G-2S+PC ports "flapping" on v7.15.3

Posted: Tue Nov 19, 2024 7:18 pm
by AlekEagle
As of this morning, two of our CCR2004-16G-2S+ actively cooled routers have essentially locked up, leaving large swaths of customers without internet. This appears to coincide with our temperatures reaching extremes of -20°F (-6.6°C) and below. According to technical specifications, these routers have been tested in -20°C conditions and have operated fine. Our enclosures are sealed from the outside environment, meaning moisture should not be a problem.

We have been stress testing our CCR2004-16G-2S+PC and thus far we have been unable to recreate the symptoms with the router indoors. Leading us to reconsider the temperature being a contributing factor in this.

Has anyone else experienced issues like this in environments below freezing and below zero?

Re: CCR2004-16G-2S+PC ports "flapping" on v7.15.3

Posted: Mon Feb 03, 2025 4:00 pm
by jaclaz
Maybe it is just a coincidence, and unrelated to temperature.

The -6.6° C are the outside environment or the inside of your enclosures?
Which kind of enclosures?
If it is sealed, how is the heat dissipated?
Which other devices are inside the same enclosure?
The CCR2004 is a 35W device, it is hard to believe that it can go below zero in a contained environment.

Re: CCR2004-16G-2S+PC ports "flapping" on v7.15.3

Posted: Mon Feb 03, 2025 11:20 pm
by jbl42
While it might be temperature issue, one additional side note:
I'm not sure, our logs are only in volatile memory .. Routers leaving large swaths of customers without internet.
Sounds strange to run routers "leaving large swaths of customers without internet" in case of issues without at least sending out logs to a central syslog server. And SNMP monitoring. I wonder how you do error investigations if every device is collecting local logs in non volatile memory getting lost at reboot or power issues. If ports go down, you don't notice until customers complain? Or if a cooling fan breaks in hot summer, how do you notice before the whole device goes down due to overheating?