Hello everyone,
I inherited a geographic backbone made up of about 70 wireless devices (all Mikrotik, various models), where I'm having big stability problems.
The problem is that communications often suffer interruptions, even of about ten seconds and as the diameter of the network increases, ie the number of wireless devices, the problem gets worse. Sometimes WinBox disconnects after only a few minutes of activity and immediately afterwards it is already possible to log in. During the night, typically when there is less traffic, the problem is much less evident.
The backbone is divided into 6 segments, each physically connected to the wired network through a Mikrotik RB260GS switch. With an Access port, each switch encapsulates the traffic of its segment in a VLAN which is then conveyed over the wired network (Trunk). The wireless devices are therefore not aware of the existence of any VLAN.
The backbone is fully bridged: most of the wireless links are configured as ap-bridge <-> station-bridge (ROS L4), the rest as bridge <-> station-bridge (ROS L3). There are both P-t-P and P-t-MP links and they are all quite stable (almost all with link down = 0). Each wireless device only ever uses two interfaces, usually wlan1 and eth1, both bridged.
Except for manual changes, the devices that make up the backbone are always the same and RSTP is active in all of them with the default settings. (!) ROS 6.47.9 Long Term is installed in all wireless devices, SwOS 2.13 in the switches.
I'm monitoring the network with The Dude and, barring the above outages, all devices are reachable. However, I noticed that during the day the root-bridge changes frequently and I thought it might depend on the stability of some wireless link. Is it plausible?
Probably the best solution is to switch to a routed network but now I have to find a quick solution.
What could be the cause?
Does the election of a different root-bridge involve the loss of communication?
Since there are currently no redundant links and they are not provided, could disabling RSTP improve the situation?
If, on the other hand, there is actually a loop, how can I identify it without having to disconnect pieces of network in turn?
Please give me some advice.
Thank you!