Community discussions

MikroTik App
 
User avatar
EternalNet
just joined
Topic Author
Posts: 20
Joined: Sun Jul 02, 2023 2:27 pm
Location: Poland

MLAG/MSTP take down entire network, Redundant Links without Loops

Fri Mar 14, 2025 10:43 pm

Hello,

This current setup I try to deploy relay on 2x CRS354-48G-4S+2Q switches and CCR1009 router, with few servers at the end.
Router is connected with SFP to each switch, both switches are connected with qsfpplus1-1 links between them, server is connected via sfp1 to switch#1 with sfp2 with switch#2 plus two extra ethernet connection per switch (4 total)

I read mt guide and any other guides that I could find about MLAG with same end results, as soon as I enable and get connection between switches one of them get kickout - traffic is getting blocked? From test-to-test sometime entire network is down (both switches).

I'm using vlan in current network (vlans on switch/bridge), but from documentation that is not a problem as I enabled mstp (MT documentation said mstp is ok with mlag, one single presentation from few years ago say mstp is not compatible with mstp).

The configuration I did was something along this steps:
1. on server side I created bond and added both sfp and 4 eth port as LACP, with l2+3.
2. on both switches I remove ports that will be used for bond from bridge
3. on both switches I created single bond with sfp+2eth with 802.3ad l2+3
4. on both switches I added created bond to bridge
5. vlans were enabled already on bridge.
6. up until now I can ping server from both switches (routeros)
7. before I even enable I can disable link to server and I see that switch with disable link will ping server thru router, but not always (i have to disable/enable bonds on switches)
8. I add on both switches qsfpplus port to bridge with pvid 99 (as in most documentation and there is no collision)
9. **and this is where "funny things happens"**
10. as soon I add qsfpplus1-1 on both port into bridge MLAG peer port and see "connected" one switch get kickout (If I'm unlucky both get kick out killing network).
11. as soon as I'm able to CLI any switch and disable peer port network is up and running.

I tried adding qsfpplus1-1 to all vlans as tagged, not difference. It's been few days now. And I'm stuck here trying to deploy redundancy. Both switches pass vlans without issue the way it was intended, LACP also works without issue with same server connected just to one switch.

I'm out of ideas

I tried to recreating this in GNS3 but looks like CHR version of RouterOS don't support MLAG at all, which I notices after setting everything up.

On forum I found few success stories but many failed on too, and now I dont know if the issue is with my setup, lack of hardware, lack of configuration or just bugs in os.
 
User avatar
sirbryan
Member
Member
Posts: 450
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Fri Mar 14, 2025 11:02 pm

I tried adding qsfpplus1-1 to all vlans as tagged, not difference. It's been few days now. And I'm stuck here trying to deploy redundancy. Both switches pass vlans without issue the way it was intended, LACP also works without issue with same server connected just to one switch.
The peer link between the switches has to be specified in the bridge's MLAG settings. Also, the native VLAN (PVID) of those ports has to be something other than 1, and then 1 has to be tagged across it. Also, the STP priority and all other STP settings need to be identical.

I have multiple MLAG stacks now made out of CRS309's, CRS310's, CRS317's, a CRS312 + CRS354, and CRS326's (both copper and SFP+ models). Admittedly, they are all running STP, not MSTP, so the STP config is a little bit simpler.
 
User avatar
EternalNet
just joined
Topic Author
Posts: 20
Joined: Sun Jul 02, 2023 2:27 pm
Location: Poland

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Fri Mar 14, 2025 11:32 pm

The peer link between the switches has to be specified in the bridge's MLAG settings. Also, the native VLAN (PVID) of those ports has to be something other than 1, and then 1 has to be tagged across it. Also, the STP priority and all other STP settings need to be identical.
Yes, MLAG is specified on bridge on each switch, with mlag-id being same. It is added with PVID=99, I tried adding vlan tagged as that link. I can try switch to STP, as MSTP (that are on all 3 devices) was just because it was recommended in my setup.
I have multiple MLAG stacks now made out of CRS309's, CRS310's, CRS317's, a CRS312 + CRS354, and CRS326's (both copper and SFP+ models). Admittedly, they are all running STP, not MSTP, so the STP config is a little bit simpler.
Do I need also add bonded interface on router side ? or only on both switchs (1sfp+2eth) + device (2sfp+4eth) ? because I also saw example with router having bond link to both switches with different mlag-id than the server.
 
User avatar
sirbryan
Member
Member
Posts: 450
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Fri Mar 14, 2025 11:48 pm

Yes, MLAG is specified on bridge on each switch, with mlag-id being same. It is added with PVID=99, I tried adding vlan tagged as that link. I can try switch to STP, as MSTP (that are on all 3 devices) was just because it was recommended in my setup.

There is no MLAG-ID on the bridge's configuration, but you do need to set it up when creating the bond on each switch. They can be any of the ports, as long as the MLAG-ID's match for all ports that are part of the bond.

This is my config between two CRS326's with 24 SFP+ and two QSFP ports. Ports 1 and 2 of each switch are configured as a bond (bond-01 and bond-02) to be used for two lab routers. In the real setup, QSFP1 on the top switch is plugged into QSFP2 on the bottom switch. I've only posted the config for one switch. Aside from which QSFP ports are being used, everything else is identical. (Adapt to your switch(es) accordingly.)
/interface bridge
add name=bridge priority=0x9000 vlan-filtering=yes

/interface ethernet
set [ find default-name=qsfpplus1-1 ] name=qsfpplus1-mlag-peer

/interface bonding
add lacp-rate=1sec mlag-id=101 mode=802.3ad name=bond-01 slaves=sfp-sfpplus1 transmit-hash-policy=layer-3-and-4
add lacp-rate=1sec mlag-id=102 mode=802.3ad name=bond-02 slaves=sfp-sfpplus2 transmit-hash-policy=layer-3-and-4

/interface bridge mlag
set bridge=bridge peer-port=qsfpplus1-mlag-peer

/interface bridge port
add bridge=bridge interface=qsfpplus1-mlag-peer pvid=2
add bridge=bridge interface=bond-01
add bridge=bridge interface=bond-02

/interface bridge vlan
add bridge=bridge tagged=qsfpplus1-mlag-peer untagged=bridge vlan-ids=1
add bridge=bridge comment="MLAG Peer" untagged=qsfpplus1-mlag-peer vlan-ids=2
Do I need also add bonded interface on router side ? or only on both switchs (1sfp+2eth) + device (2sfp+4eth) ? because I also saw example with router having bond link to both switches with different mlag-id than the server.
One of the benefits (and/or purposes) of bonding (LAG/LACP/802.3ad) is to provide redundancy. So yes, if you want the router(s) or server(s) connecting to the switches to have that redundancy (in case one of the switches fails), you would set up a normal LACP or bond link on the router/server, then plug one cable from each switch into the router/server's two ports.
 
User avatar
EternalNet
just joined
Topic Author
Posts: 20
Joined: Sun Jul 02, 2023 2:27 pm
Location: Poland

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 1:21 pm

I've only posted the config for one switch. Aside from which QSFP ports are being used, everything else is identical. (Adapt to your switch(es) accordingly.)
/interface bridge add name=bridge priority=0x9000 vlan-filtering=yes

/interface ethernet
set [ find default-name=qsfpplus1-1 ] name=qsfpplus1-mlag-peer

/interface bonding
add lacp-rate=1sec mlag-id=101 mode=802.3ad name=bond-01 slaves=sfp-sfpplus1 transmit-hash-policy=layer-3-and-4
add lacp-rate=1sec mlag-id=102 mode=802.3ad name=bond-02 slaves=sfp-sfpplus2 transmit-hash-policy=layer-3-and-4

/interface bridge mlag
set bridge=bridge peer-port=qsfpplus1-mlag-peer

/interface bridge port
add bridge=bridge interface=qsfpplus1-mlag-peer pvid=2
add bridge=bridge interface=bond-01
add bridge=bridge interface=bond-02

/interface bridge vlan
add bridge=bridge tagged=qsfpplus1-mlag-peer untagged=bridge vlan-ids=1
add bridge=bridge comment="MLAG Peer" untagged=qsfpplus1-mlag-peer vlan-ids=2
And this start to make me mad. This is same configuration im using + I have vlans so i put peer link/bond also into vlans as tagged.
# router 
use bridge with vlans on bridge so no need to add brigde and port here
i remove ether2 and sfp1 from bridge (those are switchs links)
add lacp-rate=1sec mlag-id=101 mode=802.3ad name=bond-01-switches slaves=ether2, sfp-sfpplus1 transmit-hash-policy=layer-3-and-4
/interface/bridge/port/add bridge=bridge interface=bond-01-switches
add vlan-ids=401 tagged=bond-01-switches bridge=bridge
add vlan-ids=905 tagged=bond-01-switches bridge=bridge
... (those vlans already exist so its edit command but yeah... and there are few more...)

# switch 1 and 2
/interface/bridge/add name=bridge priority=0x9000 vlan-filtering=yes protocol-mode=stp
/interface/ethernet/set [ find default-name=qsfpplus1-1 ] name=qsfpplus1-1-mlag-peer
/interface/bridge/mlag/set bridge=bridge peer-port=qsfpplus1-1-mlag-peer
/interface/bonding/
add lacp-rate=1sec mlag-id=101 mode=802.3ad name=bond-01-router slaves=ether1 transmit-hash-policy=layer-3-and-4
add lacp-rate=1sec mlag-id=102 mode=802.3ad name=bond-02-server slaves=ether1 transmit-hash-policy=layer-3-and-4
/interface/bridge/port/
add bridge=bridge interface=bond-01-router
add bridge=bridge interface=bond-02-server
add bridge=bridge interface=qsfpplus1-1-mlag-peer pvid=2
/interface/bridge/vlan
add bridge=bridge tagged=qsfpplus1-1-mlag-peer untagged=bridge vlan-ids=1
add bridge=bridge comment="MLAG Peer" untagged=qsfpplus1-mlag-peer vlan-ids=2

# server
this is truenas so i create bond with LACP, all ports, 2 vlans on top of that bond with static ips.
MLAG peer port status is connected, one switch is primary other secondary.
With this configuration router stop to see switches on neighbor list (discovery is bind to "all").
I'm unable to ping switches from router on every single vlan (all vlans on switches has ip address attach to it as this is help with troubleshooting)
i'm unable to ping server from router
i'm unable to ping switches between
just "timeout" and "host unreachable".
I'm able to ping switch itself on its links.
I'm able to connect to switch via second bridge with single managment port as my backdoor.

# BUT #
If I did everything above and added qsfpplus1-mlag-peer as tagged port for every vlan I have and leave it as untagged for vlan2 (peer vlan) Im in the spot I was at begining:
I can ping switch1 from router and I can enter on it via vlan ip... but I cannot on second switch...
To make this more confusing. If I disable one or both bonds to server on swtiches ... I can ping switch 2 from switch 1 but I cannot ping switch 2 from router nor my pc that is connecte via router (and vice versa).
I don't see any loop issues in any of those 3 devices log panel.

# another BUT #
Currently switch 2 is secondary in MLAG, if I lower the priority of switch2 in MLAG peer link setup now switch 1 is not available and act same as above but switches sw1 <=> sw2.

# Edit2:
I wanted to exclude as many unknown as possible, so I disabled both bonds to server on switches, without luck... it looks like the bonds + mlag is the issue, but for non logical reason I could see... no matter if bond from router or from server as soon as both tracks are enabled and mlag is established the secondary switch is "cut" out of the picture.
 
User avatar
sirbryan
Member
Member
Posts: 450
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 5:10 pm

Yes all VLANs have to be tagged on the peer port on both switches, and then tagged or untagged on the bonds, depending on your choices for each bond.

For example 401 and 905 should be tagged on the router's bond on the switches and on their MLAG peer port.
Last edited by sirbryan on Sat Mar 15, 2025 5:17 pm, edited 1 time in total.
 
User avatar
sirbryan
Member
Member
Posts: 450
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 5:15 pm

Your config shows ether1 as being the slave to both bonds. Is that a typo or accident? It should be ether1 to the router and ether2 to the server (or however you configured those).
 
User avatar
EternalNet
just joined
Topic Author
Posts: 20
Joined: Sun Jul 02, 2023 2:27 pm
Location: Poland

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 5:16 pm

Yes all VLANs have to be tagged on the peer port on both switches, and then tagged or untagged on the bonds, depending on your choices for each bond.

And yes, the appropriate MLAG ID per bond port so the switches know how to match the bonded ports.
mlag-id for bond should be on switches only right ? (not on router and client)

I stopped mixing interfaces now bond from router has both ethernet interfaces and currently it is semi working (or something is not working), but in router logs I see warning about looped packets.
I can access router, I can access switches, I cannot access server.

Edited:
If I do set mlag-id on bond to switches (ether1,2) it looks like the loop is fixed
 
User avatar
sirbryan
Member
Member
Posts: 450
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 6:20 pm

Edited:
If I do set mlag-id on bond to switches (ether1,2) it looks like the loop is fixed
Yes, this is required to match bond members to each other.
 
User avatar
EternalNet
just joined
Topic Author
Posts: 20
Joined: Sun Jul 02, 2023 2:27 pm
Location: Poland

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 6:57 pm

Edited:
If I do set mlag-id on bond to switches (ether1,2) it looks like the loop is fixed
Yes, this is required to match bond members to each other.
So if I had to set mlag-id on router on bond that is connected to both switches... then I should do same on client/server? This is a bit strange because documentation is saying that bond should be a simple LACP and client is unaware of mlag setup.
On server I have LACP bond made out of eno1,eno2,eno3,eno4 (I discarded sfp1,2 to not mix interfaces because it looks like it was the issue - or i was). without mlag-id, l2+3, slow configured on server. on top of that bond I created vlan 905 with assign IP address.
On switch#1 I have bond-server 802.3ad with eth1,eth2 (links to eno1, eno2 from server) with l3+4, lacp_rate:30s, mlag_id = 102
On switch#2 I have bond-server 802.3ad with eth1,eth2 (links to eno3, eno4 from server) with l3+4, lacp_rate:30s, mlag_id = 102
Both switches has bond-server added as tagged port on main bridge with same vlan 905.

Currently only switch#2 can ping server via ip on vlan interface, but as soon as I disable bond to server on one server the other switch can ping server (so its semi-working)

Edit2:
In /interface/bonding/monistor slaves
both links bond01-router on switch1 has same partner system id that bond01-router has on switch2, same thing with bond02-server on both switches.
The only difference betweene bond01 and bond02 are flags: bond01 (working) has A-GSCD--
On switch1 that can ping server:
and bond02 on eth1 A-GSCD---- (partner flasg: A-GCD--) and eth2 A-G---F- (as this one is missing cable).
On switch2 that cannot ping server:
and bond02 on eth1 A-GS---- (partner flasg: A-G----) and eth2 A-G---F- (as this one is missing cable).
 
User avatar
sirbryan
Member
Member
Posts: 450
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 7:08 pm

So if I had to set mlag-id on router on bond that is connected to both switches... then I should do same on client/server? This is a bit strange because documentation is saying that bond should be a simple LACP and client is unaware of mlag setup.

You set the MLAG-ID on each switch port that is a part of that specific bond group. The switches use that to determine where (or where not) to send the packets.
You don't set the MLAG on the router; it should be just a normal LACP bond.

On server I have LACP bond made out of eno1,eno2,eno3,eno4 (I discarded sfp1,2 to not mix interfaces because it looks like it was the issue - or i was). without mlag-id, l2+3, slow configured on server. on top of that bond I created vlan 905 with assign IP address.

On switch#1 I have bond-server 802.3ad with eth1,eth2 (links to eno1, eno2 from server) with l3+4, lacp_rate:30s, mlag_id = 102
On switch#2 I have bond-server 802.3ad with eth1,eth2 (links to eno3, eno4 from server) with l3+4, lacp_rate:30s, mlag_id = 102
Both switches has bond-server added as tagged port on main bridge with same vlan 905.

Currently only switch#2 can ping server via ip on vlan interface, but as soon as I disable bond to server on one server the other switch can ping server (so its semi-working)
You need to tag 905 across the MLAG peer port (the cable going between the switches) too.
 
User avatar
EternalNet
just joined
Topic Author
Posts: 20
Joined: Sun Jul 02, 2023 2:27 pm
Location: Poland

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 7:48 pm

You set the MLAG-ID on each switch port that is a part of that specific bond group. The switches use that to determine where (or where not) to send the packets.
You don't set the MLAG on the router; it should be just a normal LACP bond.
Ok, then I dont need to look into that option on server.
You need to tag 905 across the MLAG peer port (the cable going between the switches) too.
bridge=BRIDGE vlan-ids=905 
     tagged=BRIDGE,qsfpplus1-1-mlag-peer,bond01-router,bond02-server
     untagged="" mvrp-forbidden="" 
     current-tagged=BRIDGE,qsfpplus1-1-mlag-peer,bond01-router,bond02-server
     current-untagged=""
Same on both switches.
Currently I get IP from vlan905 on my server, server can ping only one switch (i think the one that has first working link from bond lacp to server, because as soon I disable bond that works the other kick in and take all the trafic)
The server is being able to ping router, but not always, which is strange - something like flapping interface but the interface is fine.
 
User avatar
EternalNet
just joined
Topic Author
Posts: 20
Joined: Sun Jul 02, 2023 2:27 pm
Location: Poland

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sat Mar 15, 2025 8:04 pm

The first clue was doing bond with sfp and ether with router... as soon I change those to match ?medium? ?speed? I was able to ping both switches...
I tried the same with server but no luck... until I checked link status on both switches, one ether was 1gbps other was 100mbps, cable is fine, server is fine, so I change port on server and switch, now it 1gbps on both and we have a WINNNERRR...

Looks like I was the human-error from the begining...

Thank you very much @sirbryan your help was very needed... without that I wouldn't be sure that this should work, and your input with already working config/knowledge was priceless.
 
User avatar
EternalNet
just joined
Topic Author
Posts: 20
Joined: Sun Jul 02, 2023 2:27 pm
Location: Poland

Re: MLAG/MSTP take down entire network, Redundant Links without Loops

Sun Mar 16, 2025 3:49 pm

You need to tag 905 across the MLAG peer port (the cable going between the switches) too.
I'm currently "using" this setup. But as I reconfigure connections I notice one thing. Is MLAG setup very fragile? I had switches connected via combo,ether2 on router, I change those to ether ether2,ether3 in bond, so theretical nothing need to be done - but then switch got kickout... everything became stable after I reboot both switches.

If this is the case then I will try to not change ports after setup, if that not the case then I will pullback.
 
User avatar
sirbryan
Member
Member
Posts: 450
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MLAG/MSTP take down entire network, Redundant Links without Loops  [SOLVED]

Mon Mar 17, 2025 5:47 am

Some of the setup is a little fragile, but I'd have to see how you're doing it. Ideally you would turn down or disconnect all ports you're going to be working on, make the changes, then enable them. The bridge does have to figure out loops/STP and MTU settings etc. when you add and remove member ports.

I have a chain of probably five MLAG stacks all connected now, spread across 40+ miles, all passing a different set of VLANs on to the next.