Greetings, Recently. I've come across and issue. Both instances have been on an RB2011. One running 6.29.1 and run running 6.30.2. The second instance, Running on 6.30.2 has continued after a router swap. So the problem survived a router swap.
Here's the config. Ether 1 and 2 are OSPF routed backhauls with unique /29's on each. Also running MPLS on top of it.
The remaining ports are all in a single bridge with split horizon bridging set on all ports (IE. Ports in the bridge can not communicate with eachother. But can route in/out). The bridge has both public IP's for customers to use. And 10.x.x.x/24 IP's for us to manage connected equipment. Specifically, In ether10, I have a Digital Loggers Power controller.
Randomly, Customers will begin to claim loss of connectivity or slow internet.
Logging into the router. Everything looks normal. Looking at customer equipment. I show high latency and poor throughput.
Then, I'll notice the backhaul will be moving lets say 45Mb/s down. But all remaining 8 ethernet ports will show a 10Mb/s+ tx rate. This includes the Digital loggers power controller. Which obviously isn't requesting 10mb/s of anything ever.
Right clicking any ethernet interface. And torch'ing it will instantly fix the problem.
Further investigation shows that the bridge begins to operate as a hub for a single MAC address. And it's not always the MAC address of the same customer. But whatever that user is moving, It'll get shot out all interfaces that are a member of the bridge.
Inspecting the ARP table (IP>ARP) displays the MAC and IP of the device as expected (It would return traffic DST HOST UNREACHABLE Otherwise). However, The MAC address does NOT appear in Bridge>Hosts. Assuming a loss of ARP. I waited over 10 minutes to see if it would re-arp and begin to function like a switch again. It does not.
HOWEVER. Torching any interface that is a member of the bridge (I haven't tried non-members) instantly will install an ARP entry in the Bridge>hosts table. And my hub-like behavior disappears.
Obviously, This causes other devices in the bridge a problem since they can't always handle simulcasting 10mb/s per attached client basically allowing the problem to amplify itself.
Has anyone seen this? Any ideas?