I set up SSTP server and EoIP tunnel using default MTUs to support traveling workers. As recommended in documentation, I created a bridge and added EoIP tunnel and LAN (ether2) to it.
After doing this, some websites failed to load on devices connected to the LAN (ether2).
I believe I understand why this happened, but am not sure of the best way to fix it. Here is what I think happened:
- Before creating SSTP or EoIP, all interfaces on the switch and all devices on the LAN had MTU 1500.
- Devices on the LAN had derived TCP MSS = 1460 from MTU
- SSTP server created with default MTU 1500.
- EoIP tunnel was created and I accepted (did not notice) default MTU of 1458 (Ethernet MTU of 1500 minus EoIP overhead of 42 bytes).
- When EoIP was added to bridge, bridge MTU automatically became 1458, forcing LAN MTU to WAN to be 1458.
- Client on LAN creates TCP connection to web site with MTU=1500 and MSS=1460.
- Web site responds with MTU=1500 + MSS 1448 (extra 12 bytes of options in TCP header) + DF (Don't Fragment).
- Router respects bridge MTU 1458 and DF, drops packet, replies with ICMP message Destination Unreachable, Fragmentation Needed, MTU of next hop 1458.
- Web site does not honor (maybe does not get) ICMP message, does not lower MTU or MSS, continues sending packets that are too big and therefore dropped. This produces odd behavior that connection can be established but communication fails with first large packet.
Easiest fix is raise EoIP MTU to 1500, but then tunnel will fragment all big packets, hurting performance.
Hardest fix is to lower MTU to 1458 on all devices and interfaces, but then SSTP will end up fragmenting EoIP packets anyway.
Another option is to lower MTU to 1458 on entire LAN only (leaving WAN and SSTP at 1500), but how to set all the devices? Will they all respect the MTU specified via DHCP option? Will guest devices ignore it and just break? That's a lot of hassle in the office just to support a few guys on the road.
Maybe it would work to set EoIP MTU to 1500 in the office, but leave it at 1458 on the road and use MSS clamping? At least that way the route from LAN to WAN would keep the 1500 MTU.
But even then, SSTP will still have to fragment EoIP traffic, so really the MTU should be even smaller. But how much smaller? I have not been able to find the total overhead for SSTP, only that it is 32 bytes for control header plus some number of bytes for encapsulation overhead, but I cannot find details about the encapsulation it uses or the overhead for it. So what should the EoIP MTU really be to avoid SSTP fragmentation?
Another option is to mangle MSS everywhere (except for SSTP server), but that seems like it will cause hard-to-figure-out problems at some point in the future.
Anyway, the first question is what are the best (or at least the most efficient) MTUs to set on the SSTP server and the EoIP tunnel so that neither need to do any fragmentation? The second question is how best to make that work for the remote side without causing too much impact on the office side.
Thanks!