Community discussions

MikroTik App
 
Suthern
just joined
Topic Author
Posts: 15
Joined: Tue Nov 24, 2009 8:43 pm

Pings routed through EoIP show "flapping" behavior

Tue Dec 19, 2023 9:50 pm

Overview:
Two locations, each with a Unifi router, hyper-v server, and RouterOS running as a hyper-v guest. Each location has it's own internet service. Unifi routers provide a solid VPN between the two, but can not bridge layer-2 networks, so I'm investigating using a couple CHR's for just that purpose.

Setup:
Both CHR's as hyper-v guests have two ethernet interfaces.
- ether1 exists on the regular network and has an ip address which is always reachable from the other side. We use ether1's ip as the 'public ip' in the EoIP setup.
- ether2 on both CHRs is the 'bridged / stretched' network.

In Hyper-v CHR Guests, ether2 has 'Enable Mac Cloning' enabled, and is set to the VLAN that is stretched between locations. This VLAN TAG gets stripped by Hyper-V before passing packet into VM.

In both CHRs, I have set up an EoIP tunnel between the ether1 IPs.
In both CHRs, ether2 and the EoIP tunnel are in a bridge.

To test pinging directly between the CHRS over the EoIP link, I've added IP addresses within the stretched network IP range to the bridges.

Network Setup:
- Location 1: 10.212.1.X: PCs, CHR ether1, Unifi router normal ports
- Location 1: 10.40.1.X: CHR Bridge, Unifi router VLAN 401
- Location 2: 10.200.X.X: PC's, CHR ether1, Unifi router normal ports
- Location 2: 10.40.1.X: CHR Bridge, Unifi router VLAN 401

Summary:
- 10.40.1.X is stretched network on VLAN 401 (VLAN ID stripped before hitting CHRs)
- 10.212.1.X is normal network at location 1
- 10.200.X.X is normal network at location 2

What works (100% pings responded):
- Pinging from CHR to CHR using the ether1 addresses.
- Pinging from CHR to CHR using the bridge addresses.
- Pinging from CHR2 to Unifi2 stretched address.
- Pinging from PC at location 1 to CHR1 ether1 address.
- Pinging from PC at location 1 to CHR1 bridge address.
- Pinging from PC at location 2 to CHR2 ether1 address.

What fails (~30% of pings fail in on/off cycles, like flapping):
- Pinging from PC at location 2 to CHR2 bridge address (or Unifi1 stretched address).
- Pinging from Unifi2 to CHR2 bridge address (or Unifi1 stretched address).
- Pinging from location 1 (from PC, Unifi router, or CHR), to the Unifi2 stretched address.

Initial idea of issue:
- something related to routing between normal network and VLAN network on Unifi2 that feeds into CHR2 ether2.
- ^^^^^^ But digging into it (below), the REQUESTS from the location 1 side seem to die before leaving CHR1.

Finding out where the packets stop, when in the 'Timeout' mode (when pings are cycling on/off):
- Constant ping from CHR1 to Unfi2 stretched address
- Using packet sniffer on on CHR1 and CHR2
- PING REQUEST visible in CHR1 bridge interface
- PING REQUEST NOT VISIBLE in CHR1 EoIP interface (this is weird)
- PING REQUEST NOT VISIBLE in CHR2 EoIP or Bridge interfaces.
- PING REPLIES not visible anywhere (it appears the REQUEST does not make it into the EoIP tunnel)

For SUCCESSFULL request/replies, both PING REQUEST and PING REPLY are visible in all CHR1 and CHR2 EoIP and Bridge interfaces.

On/Off cycle characteristics:
- When broken (off cycle), pings from other devices at location 1 to the same destination (Unif2 stretched address) also fail.
- When broken (off cycle), pings from PCs or Unifi2 at location 2 to the stretched network endpoints at location 1 also fail.
- When broken (off cycle), CHR Bridge to CHR Bridge pings still work fine. (EoIP tunnel stays up, no issues noted in logs or dropped packets)
- Cycles on/off every 4 to 10 seconds.


If I use the PING program inside CHR1 and set it to ARP PING using the BRIDGE interface (destination is Unifi2 stretched address), I observe the following:
- Pings that get replies show the MAC ADDRESS of Unifi2
- Pings that DON'T get replies show only the IP address.

I've tried adding the MAC address of Unifi2 into the ARP table as a static entry, but that didn't change the on/off cycle of ping failures.

I've tried toggling every setting I can find in the interfaces, the EoIP settings, the Bridge settings, all to no avail. Either the EoIP tunnel drops completely, or it is up, but with this weird cycling issue for pings relating to Unifi2 (or going through Unifi2), even though the REQUESTS for those pings do not even make it to Unifi2.

I've tried this with other VLAN IDs, but the issue is exactly the same.
I've tried this without the added BRIDGE IP addresses, but the issue is exactly the same. (the bridge IP addresses are not required in this setup, I only threw them in so I could test pinging directly through the EoIP tunnel between the CHRs)

I've thought this might be related to STP in some way, so played around with all settings I could find. But that only either (a) made the pings completely stop, or (b) made no difference.

Note there are NO firewall rules configured on these CHRs. They exist solely for the purpose of transporting anything that comes into ether2 of one CHR, out the ether2 of the other CHR, and if there's a simpler way to do this, I'm all ears! I plan on adding firewall rules AFTER testing is stable.

Would anyone have an idea of where to look next or how to dig deeper? I'm happy to provide the config files if that helps.

Other note of interest: Setting the 'Loop Protect' on the EoIP tunnel in CHR1 to ON works fine. But when I set it to ON for CHR2, in a couple seconds I get the status "Received loop protect packet originated from xxxxxx (eoip-tun1)". I've checked all the MAC addresses in both CHRs, no overlap. I've also tried changing the eoip-tun1 (EoIP tunnel interface on CHR2)'s mac address to something new, and I get the same error with an updated MAC address. Not sure if this is related. Could be an internal loop detect mechanism is resetting every X seconds? I don't see any interfaces or other things being enabled/disabled on the CHRS. Routing table isn't bouncing. If you have any pointers I'd love to hear them! :-)

Thanks for dropping by and making it to the end!