I’ve seeing very high packet losses (>80%) on several different Mikrotik products (RB, CRS and CCR models, see list below) when sending UDP traffic from a 1GbE segment to a 100BaseT segment.
Background
I have a small WAN for a fire department, with a central administration building where my data center and public internet connection is located. We have Metro Ethernet (MOE) circuits connecting each of 6 fire stations to admin. Five run at 50mbps and one at 100mbps. At admin I use a CCR1036 as my core server, with a CCR1009 to connect to my public Internet connection, with a RB3011 on my primary MOE connection, to split out the two VLANS – one for my public Internet and the other to the subnet that my station routers are on. Each station also has a CCR1009 to connect it to the MOE.
Video training (both live and recorded) is getting to be very important here, so I’m working on tuning my network for better UDP-based video and audio service, primary Webex.
Test setup
I started by setting up a test configuration in my lab using another RB3011. I configured it with a NAT and put a PC and an RB750G router behind it as my simulated private LAN. The public side is my production network. I set the RB3011’s downstream interface to 100mbps to more-or-less simulate my WAN throughput.
I started running iperf, sending UDP traffic from a PC on my “private” test network (100mbps) to a PC on the production network (GbE). All good. I could run bandwidths up to 80Mbps with no problem. Then I tried the other way, sending from the PC on my public (GbE) network to my “private” PC. In this direction, packet losses were 80% or more at 40Mbps. Cutting down to 20Mbps dropped the packet losses to about 25%. Changing the iperf buffer size (-w parameter) to 100 dropped packet losses down to 2-4%. To repeat, going the other way I had no problem. This was pretty weird.
So, basically, when going from a GbE network to a 100BaseT network, running UDP, packet losses were enormous. TCP traffic did not lose traffic.
I first suspected the NAT, of course. I also learned that some Windows PCs appear to drop UDP packets on the receiving end of an iperf test, somewhere in the networking stack. I used PCs directly connected to each other and tested until I found two that could run iperf both ways with no losses, at up to 80Mbps.
I went through a lot of trial and error to isolate what was causing the packet loss: iperf test parameters, NAT, routing, drivers on the PCs, etc.
Simplifying
One by one I took pieces out of the test scenario, until my configuration was the following:
PC 1 ------- (1Gbps)Switch(100Mbps) ------- PC 2
The “switch” was either a Routerboard with all ports slaved to one so that only the switch chip was engaged, or a GbE switch from another vendor (see below). The 1Gbps port was left to Autonegotiate with the Gb port on the PC while the 100mbps port (on the Mikrotiks) was tried both hard-coded to 100mbps, or autonegotiating to a 100mbps port on the PC.
New Procedure:
Step 1:
Ran iperf tests using UDP, at various speeds from 1Mbps to 80Mbps from PC 2 PC 1 Result: almost no packet loss at any bandwidth.
Step 2:
Ran iperf tests using UDP from PC 1 PC2 had packet losses of 1-2% for 1Mbps, 25% for 20Mbps, 60% for 40Mbps, up to 98% for 80Mbps. I could confirm that there were real bandwidth losses through the switch, because I looked at the interface traffic on the switch under test. The speed arriving at the 1Gb interface would be the selected iperf test speed, but the speed leaving the switch on the 100mbps side would be 12-16mbps, regardless of the iperf speed requested.
Hypothesis:
I suspected maybe this was just something that happens with UDP – that is, packets are hitting the switch at 1Gbps, then are being retransmitted at 100mbps. So it makes sense that many might get lost, even though the average throughput over time is less than 100mbps.
If this is something that a switch is not supposed to be able to handle, then *all* switches should have this behavior. Let’s see.
Step 3:
So, I tried some other GbE switches, including several Mikrotik Routerboard boxes, a Linksys and a Netgear. All the Routerboards had the problem, except for a simple SwOS switch, RB250GS. The Linksys and the Netgear switches (nothing high-end, just unmanaged Gb switches) had zero packet loss switching from 1GbE to 100mbps, either direction, even at 80Mbps speeds.
Step 4:
Ask me if you want more details, but to summarize, I tried adjusting all of the following on the Ethernet interfaces in use: MTU size, L2 MTU size, interface queue type, flow control, hard-coded vs autonegotiated connection speed and duplex. I tried applying simple queues to the interfaces.
Step 5:
I ran bandwidth tests using the tester built into RouterOS, setting up a separate router on each end of the connections above, in place of the PCs. Interestingly, the RouterOS bandwidth test didn’t have any problems.
Question:
So, is there a bug in the Mikrotik switching fabric that keeps them from being able to switch UDP traffic (iperf, at least) from a GbE segment to a 100BaseT segment? It seems weird that a couple of low-end devices from other manufacturers do it just fine. Also, could this be related to some of the problems that have been report elsewhere, having to do with linking segments of different speeds?
http://forum.mikrotik.com/viewtopic.php?t=81936
Please help:
I’m sure I’ve made some mistakes in my technique here, but I’m pretty sure this is real.
If this question attracts interest, I’ll be happy to post actual configuration scripts and test output. But this is long enough already. And maybe there’s something obvious that I missed.
More detail:
Below are the Mikrotik models, RouterOS releases and Routerboard firmware I used, and the models of Linksys and Netgear switches. Sorry I wasn’t able to test with other models of switches – these were all I had on hand.
Switch / RouterOS / Firmware / Result (Fails = drops UPD packets)
Mikrotik CRS125-24G-1S-RM (switch chip: QCA8513L) / 6.36.3 / 3.24 / Fails
Netgear ProSafe 5-port Gigabit Switch GS105 / / OK
Mikrotik RB3011-UiAS-RM (QCA-8337) / 6.36.3 / 3.27 / Fails
Mikrotik Routerboard 250GS (switch chip??) / (SwOS) 1.17 / / OK
Mikrotik RB750G (Athenos 8316) / 6.63.3 / 2.39 / Fails
Mikrotik RB951G-2HnD (Atheros 83270) / 6.26.2 / 3.24 / Fails
Linksys SR2024C / / / OK