I'm looking after a network that went all-in on MikroTik (why wouldn't you).
The core switches are CRS518-16XS-2XQs and my client access switches are CRS354-48P-4S+2Q+. Here's a diagram of how everything is linked together:
I have a dude server and I'm using it to collect syslogs. I collect info, warning, error and stp logs. At the same time everyday 4:18 in the afternoon, the 802.3ad connection between CORE-SW02 and CORE-SW04 seems to start learning and forwarding. Then CORE-SW01 starts discarding the qsfp28 connection to CORE-SW02.
Basically, I get a whole host of TCHANGE start and TCHANGE over events until 4:25 in the afternoon when it settles down.
I'm not much of a layer 2 guy, but I thought it might be related to my RSTP setup. As a result, I finally added priorities to the core switches, as I've added to my diagram. This doesn't seem to have had any impact, although the network does seem to behave better in other ways now.
I've attached my syslog so you can see what I mean.
I'm looking for ideas on what's happening. I'm also looking for ideas on what more logs I should be grabbing to get an idea of what's happening. I'd also appreciate some guidance on this current topology. I'm not a big fan of it because it's a square and I'd prefer a triangle. However, these are two sites separated by quite some distance. There are 6 fibre cables that go between the sites, so we setup 3 fibres on each core switch and 802.3ad them.
All routers are running 7.15.3.