I'm not sure if my expectations are incorrect, or if this is a bug that got introduced at some point over the years, so I'd like some help to understand behavior.
Basically, my bonding setup works as expected when all slave interfaces (EOIPs) on each side have the same MAC address, and doesn't work as well when those EOIPs have unique MACs.
But it does somewhat work even when all of them have unique MACs, and when just some of them have the same MAC, but in those cases there's a noticeable packet loss.
When all slaves on each side have the same MAC, there's literally no packet loss, and traffic is distributed as expected (XOR L3/L4).
This setup has been working since 2015, and back when I set it up, I'm sure that EOIP slaves had unique MACs.
But over the years I guess I used Winbox to copy those EOIPs and *some* of them got the same MACs because I missed the fact that it gets copied too.
Last week when I added another WAN link, I noticed that some of my EOIPs have the same MAC and tested making them all unique vs all the same and noticed that this was even happening.
More details of my setup can be found in this obsolete blog, but here are the highlights:
- 6 LTE modems hooked up to my home 4011 device, isolated eth interfaces, subnets, routes.
- 6 public IPs on the datacenter CHR virtual machine, l2tp server is there.
- home 4011 establishes 6 l2tp connections to the CHR. I use static routes to ensure that l2tp connections to those 6 public IPs are established over their assigned WAN modem.
- Once I've got 6 l2tp connections, I make 6 EOIP interfaces on 4011 and 6 EOIPs on CHR, targeting their pairs using local/remote IPs that are assigned to l2tp interfaces
- 1 bond interface on each side, slaving all 6 EOIPs and automatically taking MAC address from whichever slave it determines it should do
My packet loss is 0 when all slave EOIPs on the same side have the same MAC address (but each side is different MAC address).
Meaning all 6 EOIPs on the home side have the same MAC, and a different MAC is shared by all 6 EOIPs on the datacenter side.
Note that there are no VLANs, bridges or switches in the mix.
I'm hoping you could help me understand:
- why on earth does bonding even work when all slaves have the same MAC? Wouldn't it totally kill ARP, which is the only way to monitor these slaves?
- why is there a packet loss when slaves all have different MACs?
- shouldn't bonding run just fine when all EOIPs have unique MACs? My expectation is that unique slave MACs are actually necessary, but that expectation might be incorrect.