some quick comments on configuring cake

dtaht · Sun Oct 10, 2021 3:29 pm

Hi, one of the contributors to cake here. I'm pleased y'all are finally shipping it, but I have a few comments:

* A modern version of cake has support for the new diffserv LE codepoint. I'd dearly like support for that in mikrotik given how problematic CS1 proved to be, and it's a teeny patch.

* One feature of cake is that it runs the same whether at line rate, or with the shaper enabled, so you can get per-host/per flow fq, diffserv classification, etc. I'm very interested in learning of results when you try to run it or fq_codel at line rate, rather than shaped. fq_codel is the default on all interfaces, rather than pfifo_fast, in most linuxes today. I would really like it
if people put it through a battery of flent rrul tests or heavy iperf, and took captures, and plotted rtts, particularly on the higher end mikrotik hw. It is most useful with working BQL in the device driver.

* https://help.mikrotik.com/docs/display/ROS/Queues is missing support for the gso-splitting option. When using the shaper component, below 1gbit, gro "super"packets are automatically split up back into packets (and then interleaved with other flows), when unshaped, or above 1gbit, they are not. If you've got the cpu, split up superpackets.

* If you are natting at the router, try the nat option. This does not work with some forms of offloaded nat.

* If you have major bandwidth asymmetry on a link (greater than 10x1), try the ack-filter option on the slower part of the link. It gets to be a hugely *necessary* idea at ratios higher than that, see: https://blog.cerowrt.org/post/ack_filtering/

* It would be nice if mikrotik had some way of polling and displaying statistics from fq_codel for backlog, reschedules, drops, and marks, and from cake for the same. Exposing these statistics to more users would drive understanding of the role of packet loss (and marking) in controlling network delay. tc supports json output, multiple tools can parse that. See the enthusiasm for collecting stats over in the starlink community... I would love to see at the very least, drop stats out of the mikrotik userbase.

* When shaping dsl especially, it's very important to get the link type "framing" right, but also useful on cablemodems to set the docsis parameter. You can get hard up against the actual configured cablemodem rate in particular in this way instead of wasting 5-15%, and in the dsl case it is *impossible* to get a consistent shaped rate unless you set it right, or at least, conservatively. I mean that. Impossible to get some forms of dsl right unless you compensate.

* If you aren't going to use diffserv, use cake besteffort, to save on memory and cpu. To save on cpu further, don't use the ack-filter or nat options.

* There are a bunch of per host/per flow fq options that are dependent on your use cases for regulating traffic between ip addresses or ports.

* Use wash on ingress when you don't trust the diffserv markings from upstream. This a pretty heavy hammer, and it is preferred that y'all communicate with your customers about how you treat diffserv and let them optimize their own traffic, only remarking from 0 (best effort) to something else if you need to. There is a published guide to zoom traffic, among others. Wash on egress if you aren't following the relevant RFCs.

* Cake tries really hard to follow a bunch of mutually conflcting diffserv RFCs, and in an age where videoconferencing is very important the cake diffserv4 model is closer to how a wifi AP treats it. see: https://www.w3.org/TR/webrtc-priority/ for this underused facility in webrtc.

* Despite saying all this about diffserv it generally ranks dead last as an optimization technique verses better statistical multiplexing from FQ, and the short queues you get from an AQM.

I should stress that these are options and are optional, aside from getting shaped dsl compensation right, the cake defaults are pretty good.

Other notes:

* Telling your customers how they can have better wifi at home is useful also! In most cases the bufferbloat starts to shift to the home wifi at above 40mbit, and no matter all the contortions you've done here to manage your bandwidth to/from them better, everybody benefits from better home routers with sqm on the link and fq_codel on the wifi: https://blog.linuxplumbersconf.org/2016 ... y-3Nov.pdf

* The cake mailing list is the best place to ask questions or make feature requests: https://lists.bufferbloat.net/listinfo/cake - see also the archives there or on the related "Bloat" mailing list. Cake is the most advanced smart queue management (SQM) system, we've been able to design, as yet: https://www.bufferbloat.net/projects/ce ... anagement/ and whilst we initially targeted it at cpe and home gateways it is certainly proving useful in the middle of an ISP's network. We are very interested in feedback as to how to make it, or something like it, better for ISPs. One example (that I have NO idea how to make work on mikrotik) is here: https://github.com/rchac/LibreQoS

* There are multiple academic papers on how fq_codel and cake actually work, the best summary of most of the things we did to beat bufferbloat in linux is in the online book; https://bufferbloat-and-beyond.net/ - but feel free to hit google scholar for "bufferbloat", and the cobalt AQM.

* I'm really big on explaining the why (in addition to the how, above), at various levels, including entertaining ones like this:

https://blog.apnic.net/2020/01/22/buffe ... -over-yet/

Larsa · Sun Oct 10, 2021 3:33 pm

Dave, thanks for very useful tips! Should be included as "best practice" in the ros documentation.

dtaht · Sun Oct 10, 2021 8:44 pm

To give an example of where I'd hoped to see fq_codel or cake make more of a dent in the mikrotik universe, consider a topology like this:

10Gbit -> 1GBit port A
-> 1Gbit port B
10 more ports

In ANY fast->slow rate transition fair queuing, and aqm, can soften the impact of that 10Gbit interface (or multiple 1Gbit interfaces) fed into 1Gbit Port A here, achieving near zero latency for sparse flows and ultimately 5ms or less for incoming traffic. It's a complete unknown how deep the buffers are on those 1Gbit ports throughout the world, (or in any stepdown) but I strongly suspect they are far deeper than 5ms, and few have anything other than a FIFO on them. This recent paper was good: https://arxiv.org/pdf/2109.11693.pdf

Some offload engines for switches have gained RED of late, but that's still finicky to configure. The bulk of the bufferbloat effort has been on fixing the last mile, but we are seeing deep within the
ISP's network, signs of bloat there, also.

BitHaulers · Tue Oct 12, 2021 6:05 am

Do you have any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours? The auto ingress doesn't always act as I'd expect it to, and I'm not sure if it's RouterOS' implementation, or a bug, or me not understanding things.

WeWiNet · Tue Oct 12, 2021 10:53 am

Hi dtaht,

thanks for posting all this usefull information. I asked already in seperate post a bit in this direction but you really provide
massive data (which half I don't yet fully understand).
But it shows that cake is a complex tool, which is worth learning more how to use it.

Don't assume all Mikrotik affinados are queue/cake experts. Please make it (if possible) simple so all can benefit a max from your
experience.

What do you mean exactly with :

When shaping dsl especially, it's very important to get the link type "framing" right

This is one of my use cases where queuing is really really important. Can you give short example for say link of 5M down and 800k up (or whatever you want to use)

The other question from Bithaulers

any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours?

is also very valid, same problem again on my side. LTE (and soon 5G even worse) is a medium where in 24h the "pipe" itself changes heavily.
In this situation it is really hard to define the pipe size and do queueing with fixed values gets almost impossible.
What can CAKE do in this case?

Again thanks for the good data you provided.

gtj0 · Tue Oct 12, 2021 5:59 pm

Thanks dtaht!
I wish we could upvote posts and threads. I'd do both.

dtaht · Wed Oct 13, 2021 7:08 pm

Do you have any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours? The auto ingress doesn't always act as I'd expect it to, and I'm not sure if it's RouterOS' implementation, or a bug, or me not understanding things.

Don't use them? We get the "how can an end user make LTE generally usable and consistently low latency" question a lot. It's often worse than wifi. We've (bufferbloat.net) been after that entire industry for years now to do better queue management everywhere - the handsets are horrifically overbuffered, the enode-bs as well, the backhaul's both encrypted and underprovisioned...

And instead we get back all sorts on non-useful and actually extra-latency inducing things like "network slices", and other places where they've thoroughly shot themselves in the foot (like distributed cpus for the wireless connection) from a queue theory perspective and so on. One company is afraid to even look at the packet headers inside the encapsulation, so no fq or ecn is possible by their lawyer decreed policy. There's a been a ton of good research published on how to make the queuing saner on 3/4/5g but I'm still not aware of any actual products. I am hoping that the next gen of cell phones from both apple and google get that more right (I just finished up a stint at apple, can't say more) , but as for managing the downlink...

Cake's Auto-ingress is somewhat suitable for rates that fluctuate slightly, but many/most LTE/5g systems fluctuate too much. We made cake easily and transparently reconfigurable, so with adaquate stats from the hardware, or passive measurement of flows passing through it, some answers for managing inbound are more possible... but the right (I'm trying to avoid cursing here), answer was to fq and aqm the enode-bs, improve the backhauls, and stop trying to create for-pay services that don't work.

Various moral equivalents to FQ have long been in play in the rest of the "fixed wireless" market (which consists of a lot of telco folk talking to themselves, rather than recognise that non-5g tech - like most of mikrotiks market - dominates in the field).

A possible avenue for improving LTE inbound is leveraging kathie nichol'snew queue estimator that's now in bbr, and the ebpf "pping" tool we're working on... but ENOFUNDING. If they spent a little less on the marketing and a little more on the tech - or opened up more binary blobs, we could make progress, rapidly.

Larsa · Wed Oct 13, 2021 7:43 pm

Various moral equivalents to FQ have long been in play in the rest of the "fixed wireless" market (which consists of a lot of telco folk talking to themselves, rather than recognise that non-5g tech - like most of mikrotiks market - dominates in the field).

Hear, hear! Much like 3GPP trying to reinvent "Internet" and related tech stacks using their own acronyms. ; -)

dtaht · Thu Oct 14, 2021 4:40 am

I thought I'd write a brief note about SFQ vs CAKE. I think highly of SFQ. If I could go back in time to 2002, when it first arrived in linux, I'd have tried to make it the default, instead of a FIFO, given what I know now. It was *the* fundamental component in wondershaper. Nearly any place you have a FIFO today, SFQ would be better, so long as you have it properly sized.

It is still very possible to get a good result with SFQ at higher rates, if you increase the packet limit, and if you have a good mix of flows, increase the number of flows. However therein lies the rub - if you increase the packet limit, you end up with 100s of ms of bufferbloat - if you don't increase the packet limit you won't be able to achieve full bandwidth at high rates - and setting a per packet limit is not as good as setting a byte limit in an age where a packet can range in size from 64 bytes to 64k bytes.

DRR is an option that can work better than SFQ. That said, if all you have is SFQ, USE IT. Anything that breaks up bursts is good.

So... 4 improvements that came from fq_codel over SFQ.

0) It does better FQ for "sparse" flows than SFQ
1) You don't need to set the queue length, the AQM attempts to hold latencies to 5ms
2) The default number of flows is 1024, which seems to be "enough"
3) fq_codel drops from the head, not the tail of the queue, signaling congestion earlier, and avoiding bursty tail loss

What follows are two "rrul" plots, taken from the flent.org tool we use heavily in the bufferbloat project and highly recommend over, for example, web based benchmarking tools.

They test - simultaneously - 4 tcp upload streams, 4 tcp download streams, 4 measurement streams (both udp and icmp) for 1 minute, by default, and both of these are *good* results. This particular test was against a cablemodem provisioned for 100Mbit down, 10Mbit up. I'll show what a bad result looks like at the end.

sfq-spectrum-hapac2 (1).png

Take a look at the third panel on the bottom on both these plots. That's fq_codel's DRR++ derived scheduler, taking the measurement flows and putting them in the front of the queue. SFQ and DRR put new flows at the tail of the FQ queues - so if you have 32 flows, a new flow's arrival will end up at the 33th queue (Which is still WAY better than a FIFO), and be served in turn. A variant of SFQ, called SQF, noticed that it was possible to take a new arrival, serve that first, and thus newer flows - of all sorts, not just voip, dns, tcp syn/syn ack got a little boost and lower latency, than fatter flows. The DRR based design of the fq_codel scheduler on that third panel shows that with the 12 flows going, at this rate, we are saving 10ms on sparser packets - packets that have an arrival rate less than the total time it takes to serve all the other queue-making flows.

cake-spectrum-hapac2 (1).png

Now as to why the bandwidths seem a bit different - the tcp flows in the SFQ case are more jittery than the cake one because they hit the end of the SFQ's fifo, have one or more tail drops, and then have to recover more data that the codel AQM does. It turns out we deliver slightly more data in both directions in this test case.

Lastly, what does a bad result look like? Well, this is the basic behavior of a typical (spectrum) cable modem today. The latencies under load grow so bad, that it chokes the upstream flows enormously, and your voice call, well... do you like shouting 600 ft across the room to be heard? Or clicking on a web page and waiting 2 seconds for the first byte?

baseline-spectrum-hapac2.png

Best practice for fq_codel: At shaped rates below 4Mbit, you need to scale the target to the time it takes for 1MTU to egress. At 1500MTU, 1Mbit, 15ms. It generally pays to use a quantum of 300 below 100Mbit.

Cake autoscales these two parameters.

My thanks to Jordan Szuch for testing this release candidate of mikrotik on the hapac2 and providing these plots and comfort, that cake and fq_codel were actually working correctly here. SHIP IT.

(really looking forward to more testing and testers)

moeller0 · Thu Oct 14, 2021 10:15 am

Just a few notes on configuration cake overhead keywords (if in a hurry just read the bold snippets):

DSL:
ADSL* and max rate <= 25/5 Mbps: "overhead 44 atm"
Note: actually anything using ATM/AAL5, which nowadays for access links should be only ADSL, ADSL2, ADSL2+, but theoretically VDSL2 also allows ATM/AAL5 but I have seen no evidence yet that this configuration exsts in the real world. Note 44 Bytes is a realistic "bad case" encapsulation overhead seen in the wild, theoretically larger overhead seems possible albeit very unlikely. To dig deeper into ADSL overhead curious minds can have a look at https://github.com/moeller0/ATM_overhead_detector.

VDSL2**: "overhead 44 mpu 88"
Note: Actually PTM carrier instead of ATM/AAL%, this can actually be used on ADSL links as well, and as far as I know some ISPs actually use that.
Also note that PTM uses a 64/65 encapsulation so if you deduce the shaper settings from modem sync you need derate the syncrarts by 64/65 = 0.984615384615 (cake offers a ptm keyword to perform this derating automatically, but does so by adjusting the accounted packet size instead of simply adjusting the shaper gross rates. BUT for most users the sync will not be the relevant limit, but a shaper/policer at the ISP's end which enforces the contracted rates which if functional will already have the 64/65 overhead accounted for.)
VDSL2 likely has lower overhead than 44, but the bandwidth sacrifice of specifying a slightly larger per-packet-overhead is small compared to the latency-under-load-increase possible if the per-packet-overhead is too small.

DOCSIS/cable**: "overhead 18 mpu 88"
Note: The real per-packet/per-slot overhead on a DOCSIS link is considerably higher, but the DOCSIS standard mandates that user access rates are shapes as if they had 18 bytes of per-packet overhead, so for us that is the relevant value.

Getting initial shaper setting: The quickest way to get reasonable starting values to configure the shaper is to simply run a few speedtests and try to get a feel for the reliably available speeds for down- and up-link and then use these net goodput values (mostly measured as TCP/IP goodput) as gross shaper values for cake. Say you measured 100 arbitrary units, the respective gross rate on a DOCSIS link would be larger or equal to :
100 / ((1500-20-20)/(1500+18)) = 103.97
This will give the shaper a 100-100*100/103.97 = 3.82% margin compared to the true bottleneck rate, which is an acceptable starting point*, which then should be confirmed by a few bufferbloat tests, either via the dslreports speedtest (for configuration see https://forum.openwrt.org/t/sqm-qos-rec ... sting/2803) or waveform's new test under https://www.waveform.com/tools/bufferbloat.

https://openwrt.org/docs/guide-user/net ... qm-details while tailored for OpenWrt's SQM version, contains a lot of background information and configuration advice for those willing to spend more time.

*) The recommended margin is 5-15% of the true bottleneck gross rate, tyically a bit more for ingress/download and potentially a bit less for egress/upload, but 3.8% is close enough IFF one is willing to run a few tests to confirm that bufferbloat is sufficiently controlled, otherwise just take 95% of the speedtest result.

dtaht · Thu Oct 21, 2021 5:22 pm

We try to stress that the default options for cake (essentially just the bandwidth parameter) are good enough for most purposes.

That said, there are two important differences between how cake's bandwidth shaper works vis a vis htb that are useful to highlight.

Token bucket designs date back to the 70s as an easy to implement in hardware method of doing rate control. Linux HTB along the way (2006) gained the ability to compensate for dsl as cake does, but I don't know if it's configurable in mikrotik's api. Also, our thinking is flavored by the CPE -> perspective, rather than the ISP -> down, and my hope is in working with more active ISPs trying to shape their down more directly we'll find ideas worth implementing moving forward.

The more important difference between htb and cake's shaper is that a token bucket is naturally bursty. If a link has lain idle for a while, enough tokens accumulate (the htb quantum and burst parameters) that a line rate burst will pass through htb until the burst parameter is exceeded.

This means that that burst ends up accumulating in the device buffers and invokes jitter. The deficit based shaper in cake never bursts, but does need a cpu that can context switch rapidly enough to ensure a smoother delivery of packets. You can typically run cake hard up against a htb shaper, configured at the same rate, and have cake almost always win. And you can typically configure
htb with a higher burst and quantum parameter to have it use less cpu and still more or less effectively shape the connection - but it too starts getting wildly variable as you tweak those parameters to save on cpu to be able to run at higher rates.

One thing that we've failed to call out enough is doing things like saying "if you have enough cpu". How we think about that over here is a bit different from how others think about it, in that what matters is not clock rate, or straight line instructions per second, but how fast the cpu can context switch. It's often the case that a heavily pipelined cpu cannot context switch as fast as one that isn't.

Running out of cpu when shaping using either method is a PITA. Per-cpu locking is also a problem. You might peg one cpu at 100% and leave the others idle. The linux community has worked very hard to remove a bunch of locks over the years, but at the moment the most progress is being made via ebf assistance, as in libreqos and preseem. YMMV.

I see that a common means of testing mikrotik is with X tc filters (seemingly 25). Cake can work with those also, but the hope was that less tc rules would be needed with cake as a base, and some of the cpu lost, or even all of it, to using cake, recovered that way. In general we try to encourage folk to drop all their preconceptions about shaping, multiple tiers of service, and so on, and delete everything they are doing special, try cake bandwidth X, and then measure their results. I'd like a look at an ISPs typical tc rule set to see how tc is being used today.

As for multiple tiers of service - A common configuration is three tiers of htb -> SFQ, SFQ, SFQ. I've seen 6, 9, even as many as 20, and the thing commonly missed by assembling the qdiscs this way
is that every separate qdisc you add has a packet limit, each! adds to your worst case delay. You can typically drop in htb + fq_codel in those configurations and keep your worst case delay bounded better via the aqm, or apply cake which has 3 or 4 tiers of service internally.

dtaht · Sat Oct 23, 2021 10:49 pm

Some poetry and analysis from Jim Gettys: https://gettys.wordpress.com/2018/02/11 ... -elephant/

dtaht · Sun Oct 31, 2021 12:12 pm

This patch makes cake work better with a locally terminated VPN: https://lists.bufferbloat.net/pipermail ... 05257.html

DanielJB · Fri Nov 12, 2021 9:10 am

Hi Dave (dtaht),

Firstly, it's a testament to Mikrotik to see key developers such as yourself posting in the forums; secondly, your fine work on CAKE and related has made a global contribution to virtually everyone using the internet, so hats off!

> A modern version of cake has support for the new diffserv LE codepoint. I'd dearly like support for that in mikrotik given how problematic CS1 proved to be, and it's a teeny patch
+1! Would be great if you could submit a request at https://help.mikrotik.com so it is formalised.

> It would be nice if mikrotik had some way of polling and displaying statistics from fq_codel
I agree. As a first step, Mikrotik could fix the queue counters, for example enabling CAKE for all WiFi outbound queues on RouterOS 7 (/queue type set wireless-default cake-diffserv=diffserv4 cake-flowmode=dual-dsthost kind=cake), we always see:
/queue monitor
queued-packets: 0
queued-bytes: 0

As of RouterOS 7.1rc5, 'cake-bandwidth' is still a required parameter for LTE interfaces - do you agree there is still benefit using CAKE AQM without bandwidth limits? Mikrotik may be unaware of the opportunity.

Thanks,
Dan

dtaht · Sat Nov 13, 2021 12:34 am

Good catch. The bandwidth parameter should be optional for cake.

As for whether or not you can run an LTE interface at line rate wisely, the state of most of the linux drivers for that were terribly overbuffered, so the amount of backpressure you got was very late. I hope that something like AQL or BQL land for LTE interfaces, and there's some promising work towards actively sensing LTE bandwidth going on over here in the openwrt universe: https://forum.openwrt.org/t/cake-w-adap ... 108848/482

Similarly, the fq_codel for wifi stuff was only supported for 5 chipsets, any place where they can use that, instead of a shaper, is a win. ( https://lwn.net/Articles/705884/ ) . As for slamming a shaper like cake in front of it, my understanding of mikrotik's market is it's mostly ISPs, and that ISPs sell tiers of service, and in that case a cake would be good.

As for the noqueue, offloads tend to suck up all the packets so you don't see them. If their queue command could be improved to use the tc -s qdisc statistics - and show loss, backlog, and ecn statistics, that would be nice. The wifi statistics for same are buried under /sys/kernel/debug/iee*/phy*/aqm and a few other aqm files per station further below there.

Some of my personal backstory is that I was a WISP operator in Nicaragua, and I'd upgraded my backbone to wireless-n, only to have it fail (up to 30 seconds of latency) in rain, which in Nicaragua is 2+ months long. So my motivation early on in the bufferbloat effort was to fix fixed wireless, and then go back to my mountaintop there, surf and swim and so on, as soon as the fixes went into linux mainline. My attempts to retire keep being thwarted, the last time I tried to hang it up, Nicaragua had had a near-revolution, and it seemed simpler and safer to just go about fixing the whole internet for everyone... and to try and get on top of new deployments like this one to make sure they get it right...

I miss that mountain a lot, sometimes. Seeing comcast get it right was a high ( https://arxiv.org/abs/2107.13968 ), seeing starlink get it wrong ( https://www.youtube.com/watch?v=c9gLo6Xrwgw ), wasn't. And I can no longer afford to retire. But it looks like mikrotik is well on their way to getting it right, which is a high.

mducharme · Sat Nov 13, 2021 12:40 am

I have a question about priority QoS in cake. Our customers IP packets are encapsulated in PPPoE frames, which are then encapsulated in ethernet frames (VPLS tunnel), which then have two MPLS labels placed on them, which then have a VLAN header attached as the outermost layer. Is cake capable of reading the DSCP from the IP packet with all of those layers of encapsulation?

Currently the VLAN header on the outside has the proper VLAN priority set for the priority that we want the packet to be treated, but I don't think cake can read VLAN priority (PCP)?

Some background is, this is a WISP situation where the backhaul link is carrying about 90% VPLS traffic with MPLS labels (traffic for around 400 customers), and the VLAN tag has the priority set to the priority that I want the packet to be treated. We have eight different priority classes, depending on customer class (retail vs enterprise) and type of traffic, and use HTB to put the packet into the correct queue based on the VLAN priority over this backhaul. Currently we find only fifo and red useful for this on RouterOS v6, but with both we start to hit a limit at around 850Mbps of the 1Gbps backhaul link where it starts to drop retail packets even though it hasn't been maxed out. I'm hoping that maybe in RouterOS v7 one of codel or fq_codel or cake would work for this to achieve close to the maximum 1Gbps.

I tried using sfq on RouterOS v6 for this backhaul but performance substantially decreased, likely because the sfq handler got confused by the MPLS labels and put all MPLS traffic (90% of the current traffic) into the same stream.

I haven't found a lot of info online about people queuing packets with MPLS labels with codel/fq_codel/cake

Larsa · Sun Nov 14, 2021 3:03 pm

As for whether or not you can run an LTE interface at line rate wisely, the state of most of the linux drivers for that were terribly overbuffered, so the amount of backpressure you got was very late. I hope that something like AQL or BQL land for LTE interfaces, and there's some promising work towards actively sensing LTE bandwidth going on over here in the openwrt universe: https://forum.openwrt.org/t/cake-w-adap ... 108848/482

From what I gather they perform testing using shell scripts and icmp (ping). Is there a more robust method that has that logic built in the the device driver itself? I'm very keen to make this work on all types of wireless technologies that suffer from a high degree of fluctuations in throughput, and in my particular case especially for LTE and it's friends.

EDIT:
Both IEEE 802.11 and LTE (RAN) have plenty of real time performance indicators that should fit fine for tuning purposes.

dtaht · Sun Nov 14, 2021 7:45 pm

re - mpls. I have no idea if the linux flow dissector is good enough to get that far into the packet to do any good there. (I can look). It can cope with ppp-oe. If it can't find "flows", since there is seemingly no way to get at statistics in microtik, you would end up with a single queue, no matter how varied your traffic was, which you could see with (in fq_codel or sfq) tc -s class show. Otherwise you could try to hit it with a bunch of packets from different flows and see if they come out the other side in the same order.

In such a case where fq proves impossible, I might try the pie or codel AQM by themselves to keep queue sizes down. You can certainly use cake in this way as well (flowblind option), and possibly get some differentiation of service via diffserv, but it would be cheaper cpuwise to try htb + aqm.

dtaht · Sun Nov 14, 2021 7:50 pm

one of the things discussed on that openwrt thread was using a tcptrace-like tool, and elsewhere, deeply inspecting tcp rtt inflation with ebpf and one of kathie nichol's innovations, pping.

Some info here: https://lists.bufferbloat.net/pipermail ... 15772.html

however microtik is far, far behind the curve on ebpf support. The tcptrace-like tool I call wtbb but it's not under heavy development, lacking funding. I do regard lte's terrible, terrible queuing problems as a high priority, but apparently few in the 5g world claiming low latency is actually investigating or fixing queuing delay via any means. And I hate working for free, and would rather fix wifi.

dtaht · Sun Nov 14, 2021 7:55 pm

in direct answer to your question, I don't know of any linux mainline device drivers that do anything clever with lte, like bql or aql. Most of these drivers are out of tree, and I do hope somewhere in some OS, for android or for ios, there's intelligent life down there. One of these days someone will do a study of actual queuing latency in common cellphones.... apple has a new tool (the command line version is called networkQuality, the ios version is under developer settings).

What I have long done is measure the worst case latency under load for my lte connection on my boat, and shape cake to that. I don't care about bandwidth, I care that my videoconfernces work well. I experimeted with a string of podcasts gradually decaying my cake parameters to see what it really looked like to end users - with predictably awful results in the last couple ones I did. The next string of podcasts will have something like that openwrt script on them.

dtaht · Sun Nov 14, 2021 7:58 pm

And re-re-reading this question (wow, did my eyes glaze over), cake pays no attention to vlan priorites. It can, with a tc rule. Assuming it's a modern enough cake. asking your question of the cake mailing list might get you somewhere...

Amm0 · Fri Nov 19, 2021 6:33 pm

I noticed one of the RTT schemes is "satellite"... We sometimes use high speed, but high latency (500ms) GEO point-to-point IP links (10-100Mb/s SCPC)... Historically "TCP acceleration" is the approach to deal with these reliable but high RTT links for normal "web traffic" (i.e. some variant of "split TCP" using pepsal/SCPS-TS, sometimes using Hybla[-like] CC). In our case, the sat link has a fixed RTT and fixed/known, non-shared bandwidth – which is why I think CAKE may be of some use. Since we typically route sat links into Mikrotik ROS, CAKE be easy to apply in v7.1.

But I'm curious on your thoughts if "TCP acceleration" is even needed if a CAKE queue is used on either end of [a high RTT, high BW] bridged L2 satellite link?

Since the TCP CC algorithm/config employed by actual clients can dramatically effect TCP performance with high RTT, it's just not that easy to just simulate in a lab (e.g. apple's TCP stack responds differently than Linux, same for Windows, etc., and then also differently across those OS version since TCP CC flavors change) - thus curious what your experience is with CAKE in satellite use cases.

dtaht · Sat Nov 20, 2021 5:09 am

If you have a correct estimate of RTT across the satellite link, use rtt that_number + 60ms. Definitely do not use the default rtt estimate (100ms) here as it will not fill the link. "satellite" is a SWAG.

cake supports RFC3168 - style ecn - if you enable that on your endpoints you can do congestion control losslessly. Win. The FQ portion will keep lower rate request/response and voip protocols separate from the AQM, and (nearly) never drop those.

https://www.bufferbloat.net/projects/ce ... nable_ECN/ [1]

There are a bunch of other ways to go with a "tcp accellerator" depending on your topology. If you are using a tcp proxy, enabling ecn on those endpoints will control the amount of data in flight. Using a delay sensitive tcp, also.

I would like very much a flent "rrul" test from an actual real-world satellite link, with and/or without a proxy. I have plenty from starlink, nothing from GEO, would love to emulate the other new constellations coming up. some packet captures too!

[1] Apple has made it more difficult to use ECN of late. The additional sysctl required to re-enable ecn negotiation always is

sudo sysctl -w net.inet.tcp.disable_tcp_heuristics=1

See also:
https://github.com/apple-opensource/xnu ... che.c#L164

This disables mptcp and tfo also.

Your core question "are proxies even needed", I didn't answer. Please go measure.

kevinb361 · Fri Dec 10, 2021 6:46 am

Good evening, and thank you Dave for your many years of work along with the rest of your team combating bufferbloat! I have been following along for many years, and still feel like I know so little.

I am so glad to finally have fq_codel and cake in Mikrotik! Previously I had run an OpenBSD router at home for many years and it was great, but I have been running Mikrotik for a few years now. Anyhow, on to my testing..

## INFO
## Mikrotik CCR-1009
## RouterOS 7.1 Stable
## AT&T VDSL2 100/20
## San Antonio, Tx.
## Results of a ping to test server with unloaded pipe for reference:
--- dallas.starlink.taht.net ping statistics ---
27 packets transmitted, 27 received, 0% packet loss, time 26035ms
rtt min/avg/max/mdev = 28.288/28.699/29.860/0.314 ms

#### Test 1 - No queue

taht-ccr1009.png

#### Test 2 - CAKE defaults
name="cake-default" kind=cake cake-bandwidth=0bps cake-overhead=0 cake-overhead-scheme="" cake-rtt=100ms
cake-diffserv=diffserv3 cake-flowmode=triple-isolate cake-nat=no cake-wash=no cake-ack-filter=none

taht-ccr1009-2.png

#### Test 3 - Cake with NAT on download/upload and ACK filter on upload
name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=0 cake-overhead-scheme="" cake-rtt=100ms
cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=0 cake-overhead-scheme="" cake-rtt=100ms
cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=none

taht-ccr1009-3.png

#### Test 4 - Adding bridged ptm
name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-atm=ptm cake-overhead-scheme=bridged-ptm
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-atm=ptm cake-overhead-scheme=bridged-ptm
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=none

taht-ccr1009-4.png

#### Test 5 - Adding wash on download
name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-atm=ptm cake-overhead-scheme=bridged-ptm
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-atm=ptm cake-overhead-scheme=bridged-ptm
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none

taht-ccr1009-5.png

#### Test 6 - Remove bridged ptm, and set overhead to 22 (same as bridged ptm) and also add MPU 44 (would not let me save that with bridged ptm selected)
name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none

taht-ccr1009-6.png

Admittedly, AT&T is doing a pretty darn good job as of late. Bufferbloat used to be much worse with this same setup, I know they have pushed out several firmware updates over the years to this modem. It is especially heads and shoulders better than my old cable modem with Spectrum. I was lucky to fight through that horror to finally learn that it had a Puma 6 chipset which was known after a period of time to actually introduce latency to varying degrees at random! *GRR*

Anyhow, I can't leave well enough alone and why leave my buckets up to them to control, so here I am! Also, I have noticed there has not been much testing that I could find so figured I would help! Next week, I am supposed to get my new 5009 router so that will free up this one for more lab style testing. I have a CRS309 (10gb switch), CRS326 (1gb with 10gb uplinks) here so even though it would be local.. maybe I can help by doing some testing as you mentioned about like 10gb -> 1gb, etc.

Let me know how I can help, and I look forward to your feedback on my results. P.S. - thank you in advance for letting me use your server

I had done some testing against mine in Dallas, but was worried that it didn't have enough CPU to generate the traffic needed?

One more edit... here is my results from waveform's test after the last config:
https://www.waveform.com/tools/bufferbl ... 99892a2b21

dtaht · Sat Dec 11, 2021 9:33 pm

Thx so much for testing. I have a low standard right now... "does it crash?", so far, so good.

Your first result, sans cake, was really quite good, and indicates your AT&T link has only about 20ms of buffering in it, or so. Believe it or not, that's actually "underbuffered" by prior standards, and makes it harder for a single flow to sustain full rate. But: a little underbuffering is totally fine by me, and I don't care all that much if a single flow is unable to achieve full rate, I'd rather have low latency.

It's easier to determine the buffer depth via a single upload test like this:

flent -x --step-size=.05 --socket-stats -t the_options_you_are_testing --te=upload_streams=1 -H the_closest_server tcp_nup

Use the gui to print the "tcp_rtt" stats. If you use the -t option to name your different runs, you can also do comparison plots via "add other data files" in flent-gui.

there are servers in atlanta and in fremont, california, if either of those would be closer for you.

dtaht · Sat Dec 11, 2021 9:37 pm

OK, ok, I gave in, in order to do science, could you also try a tcp_nup with upload_streams=4? and =16?

The Test 1 *appears* to show an old issue raising it's head - tcp global synchronization - the amount of queue is so short that all the flows synchronize and drop simultaneously, as per panel 3 of your first plot, but in order to do "science" here, simplifying the test to just uploads would help.

Secondly it appears that something on the path is treating the CS1 codepoint as higher priority than the CS0 codepoint, when CS1 is supposed to be "background".

dtaht · Sat Dec 11, 2021 9:43 pm

Does that VDSL device do hardware flow control? Or are you shaping via cake via htb? (I'm happy to hear the bandwidth=0 parameter seems to be working otherwise?), but the only way I can think of you getting results this good is if the vdsl modem is exerting flow control....

Anyway, your last result is a clear win over what you had before, methinks. I'd like a tcp_nup test of that config too, when you find the time.

kevinb361 · Sun Dec 12, 2021 4:51 am

Thx so much for testing. I have a low standard right now... "does it crash?", so far, so good.

Your first result, sans cake, was really quite good, and indicates your AT&T link has only about 20ms of buffering in it, or so. Believe it or not, that's actually "underbuffered" by prior standards, and makes it harder for a single flow to sustain full rate. But: a little underbuffering is totally fine by me, and I don't care all that much if a single flow is unable to achieve full rate, I'd rather have low latency.

It's easier to determine the buffer depth via a single upload test like this:

flent -x --step-size=.05 --socket-stats -t the_options_you_are_testing --te=upload_streams=1 -H the_closest_server tcp_nup

Use the gui to print the "tcp_rtt" stats. If you use the -t option to name your different runs, you can also do comparison plots via "add other data files" in flent-gui.

there are servers in atlanta and in fremont, california, if either of those would be closer for you.

No crashing, I have run the CCR1009 very heavy for several days without issue! Full transparency, tonight I am on the RB5009, it just showed up yesterday so I have been toying with it. So, I will be using it for my testing tonight. I can always swap around if you would like. Either way, they are both running 7.1 Stable.

I agree with your statement on under buffering and would also much prefer lower latency than a single stream achieving full rate.

Yes sir, I am in San Antonio so server is ~30ms from me. Here is the result with the test requested, sans queueing:

rrul_-_no_queue.png

kevinb361 · Sun Dec 12, 2021 5:22 am

OK, ok, I gave in, in order to do science, could you also try a tcp_nup with upload_streams=4? and =16?

The Test 1 *appears* to show an old issue raising it's head - tcp global synchronization - the amount of queue is so short that all the flows synchronize and drop simultaneously, as per panel 3 of your first plot, but in order to do "science" here, simplifying the test to just uploads would help.

Secondly it appears that something on the path is treating the CS1 codepoint as higher priority than the CS0 codepoint, when CS1 is supposed to be "background".

HAH, I was hoping to pique your interest

Science incoming!

I just thought of something that is very annoying about this modem/"router" from ATT. I have it in 'bypass' mode so that it assigns the public IP to the router however, it is still NAT'd traffic for lack of better words. I am not sure how it actually works, but it still has it's own state table, etc. The FIOS guys have figured out a way to bypass it because they also have an ONT, etc. But since this is DSL, I am stuck with whatever they are doing inside the black box. Maybe this is what is causing the codepoint funny business, as I am not doing anything with DSCP, etc.

On to the data!

tcp_nup_-_no_queue_4up_streams.png

tcp_nup_-_no_queue_4up_streams1.png

tcp_nup_-_no_queue_16up_streams.png

tcp_nup_-_no_queue_16up_streams1.png

kevinb361 · Sun Dec 12, 2021 5:38 am

Does that VDSL device do hardware flow control? Or are you shaping via cake via htb? (I'm happy to hear the bandwidth=0 parameter seems to be working otherwise?), but the only way I can think of you getting results this good is if the vdsl modem is exerting flow control....

Anyway, your last result is a clear win over what you had before, methinks. I'd like a tcp_nup test of that config too, when you find the time.

I am not sure if the VDSL device does or not to be honest. It is an ATT branded box model BGW210. I have it in passthrough mode, but as stated above it is still some black magic NAT but 'passes' the public IP to my router.

The very first test I posted was without any queue in the Mikrotik router. After that, was all with cake, and when using bandwidth=0 it deffinently works well! Obviously, tweaking that helps it out but for a general setup out of the box.

** NOTE ** I just realized I had made a mistake in my config. I was leaving bandwidth at 0 in the cake config, and was setting up the target max limit for upload under the simple queue general settings to 19M. However, no limiting on the download. I will need to try these tests again later setting that to unlimited and setting bandwidth within cake itself. Curious to see if that makes any difference.

Setting up that last config with tcp_nup results:

tcp_nup_-_cake_4up_streams.png

tcp_nup_-_cake_4up_streams1.png

tcp_nup_-_cake_16up_streams.png

tcp_nup_-_cake_16up_streams1.png

dtaht · Sun Dec 12, 2021 7:18 am

OK.
0) Still mostly very happy it doesn't crash.

1) Your dsl device's buffer is sized in packets, not bytes. The reason we only saw a 20ms RTT before on the rrul test, vs a vs the tcp-nup test being so much larger RTT, is that the acks from the return flows on the path filled up the queue also. I leave it as an exercise now for the reader to calculate the packet buffer length on this device...

2) I figured I was either looking a shaper above cake, or at dsl flow control .(I like hw flow ontrol, btw, I was perpetually showing off an ancient dsl modem with a 4 packet buffer and hw flowcontrol + fq-codel in the early days, as FQ = the time based AQM vs a fifo worked with that beautifully and cost 99% less cpu to do that way. Sadly most dsl modems moved to a switch and don't provide that backpressure anymore. Not quite sure you just tested that without a shaper.

3) Do want to verify you are not using BBR on your client? The 5ms simultaneous drops are still a mite puzzling.

dtaht · Sun Dec 12, 2021 7:30 am

i do dream of hardware flow control, so no shaper, bandwidth=0 for cake as a tcp_nup test.

But i expect to be unlucky. Anyway, your fiddling with the frame parameters without a cake shaper active should have done nothing (I think), so that run was puzzling...

cake nat besteffort the_right_dsl_option bandwidth XMbit easiest to reason about. Do you have visibiity into the sync rate of the modem? Anyway, get that number right next then try
tcp_ndown... Note you cannot measure tcp rtt from this direction via flent directly, so we resort to inference or packet captures.

At some point I might ask you to stick your *.flent.gz files somewhere. Pleased to have so vastly improved tcp rtt.

kevinb361 · Sun Dec 12, 2021 10:19 am

OK.
0) Still mostly very happy it doesn't crash.

1) Your dsl device's buffer is sized in packets, not bytes. The reason we only saw a 20ms RTT before on the rrul test, vs a vs the tcp-nup test being so much larger RTT, is that the acks from the return flows on the path filled up the queue also. I leave it as an exercise now for the reader to calculate the packet buffer length on this device...

2) I figured I was either looking a shaper above cake, or at dsl flow control .(I like hw flow ontrol, btw, I was perpetually showing off an ancient dsl modem with a 4 packet buffer and hw flowcontrol + fq-codel in the early days, as FQ = the time based AQM vs a fifo worked with that beautifully and cost 99% less cpu to do that way. Sadly most dsl modems moved to a switch and don't provide that backpressure anymore. Not quite sure you just tested that without a shaper.

3) Do want to verify you are not using BBR on your client? The 5ms simultaneous drops are still a mite puzzling.

I will need to do much more studying to find the answer to question 1. =) I assume it will atleast partially have to do with the RTT and bandwidth as part of the equation.

I think at this point, I need to start over somewhat considering I was NOT using the bandwidth limit within cake, and was setting the bandwidth limit oustide of it. Not thinking when I started, I was used to my old way of queueing in Mikrotik by setting up a simple queue with sfq.

It is interesting to me that you mention the hw flow control, and 4 packet buffer. Earlier I watched the youtube video for the first time of you explaining the 4 packet buffer with the people in the audience as packets. Also, I had watched another video where the gentleman had mentioned hardware. I will link both here for those interested.
https://www.youtube.com/watch?v=ZeCIbCzGY6k
https://www.youtube.com/watch?v=Q6SAcO-H6b0

BBR, good point I hadn't thought of that! I am using PopOS which is a derivative of Debian. Sure enough..
❯ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = bbr2

I assume for our testing purposes, we would want to disable that correct?

kevinb361 · Sun Dec 12, 2021 11:00 am

i do dream of hardware flow control, so no shaper, bandwidth=0 for cake as a tcp_nup test.

But i expect to be unlucky. Anyway, your fiddling with the frame parameters without a cake shaper active should have done nothing (I think), so that run was puzzling...

cake nat besteffort the_right_dsl_option bandwidth XMbit easiest to reason about. Do you have visibiity into the sync rate of the modem? Anyway, get that number right next then try
tcp_ndown... Note you cannot measure tcp rtt from this direction via flent directly, so we resort to inference or packet captures.

At some point I might ask you to stick your *.flent.gz files somewhere. Pleased to have so vastly improved tcp rtt.

OK, no bandwidth shaping, and the following cake config -- and tcp_nup tests..

name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none

tcp_nup_-_cake_4up_bw0.png

tcp_nup_-_cake_16up_bw0.png

Ahh yes, it looks like your assumptions were correct!

On to the next test! Here is the info you requested from the modem's interface:

Screenshot from 2021-12-12 01-58-04.png

Screenshot from 2021-12-12 01-59-18.png

Here is the config for the following tcp_ndown tests:

name="cake-up" kind=cake cake-bandwidth=19.0Mbps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=besteffort cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=100.0Mbps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=besteffort cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none

tcp_ndown_-_cake_4down.png

tcp_ndown_-_cake_4down1.png

tcp_ndown_-_cake_16down.png

tcp_ndown_-_cake_16down1.png

Now this has me interested looking at this data.. running a RRUL test as well, because why not

rrul_-_cake_best_effort.png

Ask and you shall receive! Here are the files:
http://zylone.org/taht/tcp_nup-2021-12- ... 0.flent.gz
http://zylone.org/taht/tcp_nup-2021-12- ... 0.flent.gz
http://zylone.org/taht/tcp_ndown-2021-1 ... n.flent.gz
http://zylone.org/taht/tcp_ndown-2021-1 ... n.flent.gz
http://zylone.org/taht/rrul-2021-12-12T ... t.flent.gz

dtaht · Sun Dec 12, 2021 12:26 pm

You don't have hw flow control.

Nice to know (I guess) that BBR2 still struggles with itself. Try resetting that to cubic on the up, please, and shape to 19

add ack-filter to the up

I'm running cubic on that server for the down.

Your baseline rtt might drop in half without bonding OR if you can disable interleaving (yes, as well as your bandwidth).

dtaht · Sun Dec 12, 2021 4:59 pm

Is it possible to scrape that rate? cake supports dynamically changing it's config *without* reloading the qdisc, but I doubt mikrotik can do that with their api (?) tc qdisc change dev whatever cake bandwidth the_new_bandwidth. You should be able to get really close to the actual uplink rate (22xxxkps) with the right framing. Those little ping spikes are a bit puzzlng (something out of band like ppp-oe?) I note some dhcp and some ppp messages now exist in some implementations that actually do send the link and/or shaped rate and framing.

Your download was really pretty. But anyway, I'd like to solidify the upload using cubic at 19mbit first, ack-filter on (I worry about that option), then I'd love to see sfq (unshaped and shaped) to the same rate with both bbrv2 and cubic - We are kinda getting down to attempting rigorous science here, so perhaps scripting, and some packet captures are in order. On the other hand, if you can keep the
tested options straight in the -t option we can easily compare things later. I have a long standing hypothesis that since SFQ was so popular in the wisp markets, (ubnt uses it), and I long ago proved it was too short to sustain fat tcp flows, that it was acting as an AQM also in this market, which is why the observed bufferbloat was only in the 80-100ms range, and as people started shaping to faster and faster rates and using 8+ multiflow speedtests, didn't notice they were killing single flow tcp performance. ( https://www.bufferbloat.net/projects/bl ... _Must_Die/ ). The poor results I'd got then however, predate the advent of the linux stack's pacing and single flows have actually been scaling higher than 12mbit since against sfq's default 128 packet limit.

The reason why the rrul upload looks spotty is actually more related to sampling error, not an actual problem per se', and you are also zoomed way in. You can scale plots relative to each other as you wish, or combine them, via flent. I *like* to zoom in but try to stay cognizant of the scale, and there's a version of the plot that won't zoom on you, also.

Somewhat puzzled about the QoS stuff, but I'd rather get the bandwidth param right first. I note I'm not a huge fan of QoS in the first place due to all the differing interpretrations, and there was also a bug in some version or another that wasn't readng the dscp field properly with some encapsulations. cake has a "wash" option if you are actually seeing mismarks on ingress, or are doing something special on egress that you don't want upstream to see. i do keep hoping we can "export" a standards compliant diffserv set in the hope that the ISP might respect it, and vice versa...

The rrul test is a *stress* test using greedy traffic and not indicative of the intent of QoS. Were it to be more representative, it would send voip-like isochronous traffic through the VI queue, videoconferencing 16ms frame-like traffic through the video queue, and something torrent like through the background queue. It semi-intentiionally and semi as a mistake, only excercises 3 of the 4 cake diffserv4 or wifi hw queues, rrulv2 does this more right, haven't finished the spec yet.

Demonstrating the sad results of sending greedy traffic through a qos system that *thinks* its traffic was going to obey the rules was also on my mind at the time. You still see a lot of strict priority queues out there where if one user lucks into the right dscp marking, they can starve out everyone else. Cake's game theory here uses soft admission control so that that doesn't happen, and in general shows the benefits of short queues and 5 tuple fair queuing over any form of qos, and furthermore does per host fq, so the worst a user can do is do themselves in not everybody else.

There are 110 other tests in the suite, i've got rather good at reading the rrul test over the years, it's the way to get a picture with the least amount of effort, then we do the tcp_nup and down tests. I might not have needed to suggest that had I not noticed that it looked like you were running BBR. The square wave tests are useful,, as are the various _var versions which let you test different servers.

kevinb361 · Sun Dec 12, 2021 6:41 pm

You don't have hw flow control.

Nice to know (I guess) that BBR2 still struggles with itself. Try resetting that to cubic on the up, please, and shape to 19

add ack-filter to the up

I'm running cubic on that server for the down.

Your baseline rtt might drop in half without bonding OR if you can disable interleaving (yes, as well as your bandwidth).

Roger that, I figured no hw control was the case.

I have changed net.ipv4.tcp_congestion_control=cubic on the client, and will use the same settings as the last test which have 19M for upload and ack=filter. Pasting config here for sanity.

name="cake-up" kind=cake cake-bandwidth=19.0Mbps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=besteffort cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=100.0Mbps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=besteffort cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none

tcp_nup_-_cake_4up_streams_client_cubic.png

tcp_nup_-_cake_16up_streams_client_cubic.png

I am going to upload data to my google drive. Hopefully that makes things a little easier to keep track of.
https://drive.google.com/drive/folders/ ... sp=sharing

This run will be in folder 1-Change to cubic on client

It appears that I cannot turn off interleaving in the modem. However, I can tell it to use line 1, line 2, or both. I will have to wait until next weekend to test this out. The ol lady is home and if she losses her internet.. well, we all know where that leads us

dtaht · Sun Dec 12, 2021 7:09 pm

I think she'll be happy with your efforts so far.

kevinb361 · Sun Dec 12, 2021 7:45 pm

Is it possible to scrape that rate? cake supports dynamically changing it's config *without* reloading the qdisc, but I doubt mikrotik can do that with their api (?) tc qdisc change dev whatever cake bandwidth the_new_bandwidth. You should be able to get really close to the actual uplink rate (22xxxkps) with the right framing. Those little ping spikes are a bit puzzlng (something out of band like ppp-oe?) I note some dhcp and some ppp messages now exist in some implementations that actually do send the link and/or shaped rate and framing.

Your download was really pretty. But anyway, I'd like to solidify the upload using cubic at 19mbit first, ack-filter on (I worry about that option), then I'd love to see sfq (unshaped and shaped) to the same rate with both bbrv2 and cubic - We are kinda getting down to attempting rigorous science here, so perhaps scripting, and some packet captures are in order. On the other hand, if you can keep the
tested options straight in the -t option we can easily compare things later. I have a long standing hypothesis that since SFQ was so popular in the wisp markets, (ubnt uses it), and I long ago proved it was too short to sustain fat tcp flows, that it was acting as an AQM also in this market, which is why the observed bufferbloat was only in the 80-100ms range, and as people started shaping to faster and faster rates and using 8+ multiflow speedtests, didn't notice they were killing single flow tcp performance. ( https://www.bufferbloat.net/projects/bl ... _Must_Die/ ). The poor results I'd got then however, predate the advent of the linux stack's pacing and single flows have actually been scaling higher than 12mbit since against sfq's default 128 packet limit.

The reason why the rrul upload looks spotty is actually more related to sampling error, not an actual problem per se', and you are also zoomed way in. You can scale plots relative to each other as you wish, or combine them, via flent. I *like* to zoom in but try to stay cognizant of the scale, and there's a version of the plot that won't zoom on you, also.

Somewhat puzzled about the QoS stuff, but I'd rather get the bandwidth param right first. I note I'm not a huge fan of QoS in the first place due to all the differing interpretrations, and there was also a bug in some version or another that wasn't readng the dscp field properly with some encapsulations. cake has a "wash" option if you are actually seeing mismarks on ingress, or are doing something special on egress that you don't want upstream to see. i do keep hoping we can "export" a standards compliant diffserv set in the hope that the ISP might respect it, and vice versa...

The rrul test is a *stress* test using greedy traffic and not indicative of the intent of QoS. Were it to be more representative, it would send voip-like isochronous traffic through the VI queue, videoconferencing 16ms frame-like traffic through the video queue, and something torrent like through the background queue. It semi-intentiionally and semi as a mistake, only excercises 3 of the 4 cake diffserv4 or wifi hw queues, rrulv2 does this more right, haven't finished the spec yet.

Demonstrating the sad results of sending greedy traffic through a qos system that *thinks* its traffic was going to obey the rules was also on my mind at the time. You still see a lot of strict priority queues out there where if one user lucks into the right dscp marking, they can starve out everyone else. Cake's game theory here uses soft admission control so that that doesn't happen, and in general shows the benefits of short queues and 5 tuple fair queuing over any form of qos, and furthermore does per host fq, so the worst a user can do is do themselves in not everybody else.

There are 110 other tests in the suite, i've got rather good at reading the rrul test over the years, it's the way to get a picture with the least amount of effort, then we do the tcp_nup and down tests. I might not have needed to suggest that had I not noticed that it looked like you were running BBR. The square wave tests are useful,, as are the various _var versions which let you test different servers.

I am not sure what you mean by scrape the rate? Do you mean change the bandwidth limit real time during a test, or possibly using it as part of a script to help automate testing using the API?

Here are the test results:

SFQ shaped cubic

tcp_nup_-_SFQ_4up_shaped_cubic.png

tcp_nup_-_SFQ_16up_shaped_cubic.png

SFQ unshaped cubic

tcp_nup_-_SFQ_4up_unshaped_cubic.png

tcp_nup_-_SFQ_16up_unshaped_cubic.png

SFQ shaped bbr

tcp_nup_-_SFQ_4up_shaped_bbr.png

tcp_nup_-_SFQ_16up_shaped_bbr.png

SFQ unshaped bbr

tcp_nup_-_SFQ_4up_unshaped_bbr.png

tcp_nup_-_SFQ_16up_unshaped_bbr.png

All data can be found in the 2-SFQ Testing foler
https://drive.google.com/drive/folders/ ... sp=sharing

Just wanted to say, thank you for your analysis and education. I need to go back a few more times and re-read this thread to try and consume it all better. =) I am going to set the client back to cubic and fire cake back up and play around with it some more to see if I can get the framing tweaked to maybe get closer to the upload rate as you had stated.

kevinb361 · Sun Dec 12, 2021 7:49 pm

I think she'll be happy with your efforts so far.

I have given up asking her how the internet is doing.. she is very binary. It either works or it doesn't. AHAHA!!!

Only thing I could do to make her happier is move the AP on that side of the house into her room so she has a better signal from her devices, she is right on the edge of GOOD 5g. I have done ALOT of testing/tweaking on the wifi here as well (all mikrotik). That will be the next round of testing after all of this. Go back and re-test/tweak the wifi and out of curiosity see how the flent tests fair over the air compared to over the wire.

kevinb361 · Sun Dec 12, 2021 9:04 pm

OK, I believe I have the framing sync'd up pretty well.

To quote moeller0 above, he is spot on with the overhead of 44 and also spot on with the fact that VDSL2 likey having a lower overhead than 44.

Also, to note.. I have the bandwidth limit set to 22M, which is the bonded upstream rate of the modem. So, it is obvious to me that the ISP is capping me at 20mbps as per service agreement.

I am really splitting hairs at this point..

tcp_nup_-_CAKE_1up_ptm_overhead_40_mpu80.png

Here it is zoomed in

tcp_nup_-_CAKE_1up_ptm_overhead_40_mpu80-zoom.png

Here is a graph showing the difference between number of streams with overhead 40, and mpu 80

tcp_nup_-_CAKE_1up_ptm_overhead_40_mpu80-streams.png

dtaht · Sun Dec 12, 2021 9:32 pm

I am pretty sure you have the overhead right at this point. I'm also happy to see it not crash.

In the interest of science, however, if at some point you could also repeat the 4up test with htb + fq_codel, that would be interesting. Also if you were to enable ecn for a fq_codel vs cake comparison on your client.

While we definitely get more throughput and less FQ latency from cake, with more bounded results from that side

bothersome2.png

cake's "Cobalt" AQM tcp RTT is oscillating far more than I would have expected. SFQ's overly short buffers are winning pretty good here.

bothersome.png

dtaht · Sun Dec 12, 2021 9:39 pm

your 16up result seems kind of anomalous.

dtaht · Sun Dec 12, 2021 9:50 pm

by "scraping the rate" I meant rolling some sort of script to pull it off the modems sync rate, but since your isp is shaping you instead, stick to the 19.

kevinb361 · Sun Dec 12, 2021 9:56 pm

your 16up result seems kind of anomalous.

Agreed, I just ran it again.. and got similar results as before. Almost identical, the speed is wonky, however latency is still great.

kevinb361 · Sun Dec 12, 2021 9:57 pm

by "scraping the rate" I meant rolling some sort of script to pull it off the modems sync rate, but since your isp is shaping you instead, stick to the 19.

Ahh ok, gotcha! Well, that is the interesting thing not sure you noticed that last go around I had set the rate to 22 to match the rate in the modem, and it appears to be good. I assume you say keep it at 19 to give myself some headroom in case that rate were to drop in the future?

dtaht · Sun Dec 12, 2021 10:03 pm

no, I didn't notice. 19 makes my head hurt less for now? In general dsl tends to fluxuate in rain, over the course of a day, etc, so leaving yourself headroom is a good idea.

dtaht · Sun Dec 12, 2021 10:11 pm

as for the up16 anomaly, try htb + fq_codel...

And at some point, when your gf is not looking, reboot and try cake again at up16? I return to my initial objective, not crashing. This is 5.6.x? cpu arch?

kevinb361 · Mon Dec 13, 2021 12:13 am

no, I didn't notice. 19 makes my head hurt less for now? In general dsl tends to fluxuate in rain, over the course of a day, etc, so leaving yourself headroom is a good idea.

Good point, not to mention I thought about it afterwards.. even thought the sync rate is 22, the ISP is obviously holding me at 20. So, as not to let them be the bottleneck, 19 makes sense in that case as well.

kevinb361 · Mon Dec 13, 2021 12:37 am

as for the up16 anomaly, try htb + fq_codel...

And at some point, when your gf is not looking, reboot and try cake again at up16? I return to my initial objective, not crashing. This is 5.6.x? cpu arch?

Hmm, well after a fresh reboot.. the results are the same for the up16 and cake. This is odd. Per the Mikrotik 7.1 release changelog, it is running 5.6.3, and this router has a quad core 350-1400 (auto) MHz arm64 chip. Looking up the model of the CPU, it appears to be a Marvell ARMADA 7040 https://www.marvell.com/content/dam/mar ... 017-12.pdf

As far as fq_codel, the results appear to be pretty similar to cake..

tcp_nup_-_fqcodel_16up_ptm_overhead_40_mpu80.png

dtaht · Mon Dec 13, 2021 1:15 am

I'm kinda hoping this is a bug in flent!!!

kevinb361 · Mon Dec 13, 2021 1:19 am

I'm kinda hoping this is a bug in flent!!!

Just for the heck of it.. I added some more data here.. I added 8 and 32. It looks like even with 8 it starts to drop.. and gets worse with more, however 16 and 32 are roughly the same.

dtaht · Mon Dec 13, 2021 2:44 am

I'm off researching kernel versions. NOT relevant to this was the wireguard patch that went into 5.7.

https://github.com/dtaht/sch_cake/issue ... -984503893

If you have a mikrotik account (I am not a mikrotik customer), and can file a bug, I'm a bit concerned.

I wouldn't mind, however, returning to testing downloads. Your 16 flow download was perfect...

dtaht · Mon Dec 13, 2021 2:46 am

I'd wanted a tcp rtt plot for the 4up test also. You can recreate my cdf if you like comparing the sfq vs cake vs fq-codel.

dtaht · Mon Dec 13, 2021 3:23 am

also, 8, 16, 32 with SFQ?

dtaht · Mon Dec 13, 2021 3:37 am

I hate bugs. :/ Anyway, a packet capture of the 16 flow test would be good at this point.

tcpdump -i your-interface -s 128 -w 16flowscake.cap

We'd never tested bonding until today... and I could imagine us having a lot of packet reordering in a variety of ways.

Assuming this is a bug that isn't in flent - it's one of those darn things that didn't show up in testing because we didn't stress things hard enough. The weird thing is I keep seeing artifacts in the latest release of all this stuff in newer kernels, that don't match the kinds of results we were getting when we first mainlined this code. https://forum.openwrt.org/t/validating- ... /111123/10
is one example.

Can't even rule out a bug in your host's tcp. WI have a research group that can take a look at this, try to reproduce.

ANYWAY. At least it doesn't crash and you have consistently low latency, and probably rarely stress out a box this hard. thx so much for being all over this with me!

kevinb361 · Mon Dec 13, 2021 5:44 am

I am pretty sure you have the overhead right at this point. I'm also happy to see it not crash.

In the interest of science, however, if at some point you could also repeat the 4up test with htb + fq_codel, that would be interesting. Also if you were to enable ecn for a fq_codel vs cake comparison on your client.

While we definitely get more throughput and less FQ latency from cake, with more bounded results from that side

bothersome2.png

cake's "Cobalt" AQM tcp RTT is oscillating far more than I would have expected. SFQ's overly short buffers are winning pretty good here.

bothersome.png

I totally missed this post earlier. My brain went into shutdown for a bit apparently, went and vegged out for a bit.

Testing of fq_codel with ECN, vs cake 4up..

tcp_nup_-_cake_4up.png

tcp_nup_-_cake_4up1.png

kevinb361 · Mon Dec 13, 2021 5:49 am

I'm off researching kernel versions. NOT relevant to this was the wireguard patch that went into 5.7.

https://github.com/dtaht/sch_cake/issue ... -984503893

If you have a mikrotik account (I am not a mikrotik customer), and can file a bug, I'm a bit concerned.

I wouldn't mind, however, returning to testing downloads. Your 16 flow download was perfect...

Very cool! Hopefully this week, I will be getting my brother's new Mikrotik router installed and testing wireguard between his house and mine.

I will definitely file a bug with Mikrotik.

Right on, what kind of download tests would ya like to run

kevinb361 · Mon Dec 13, 2021 5:55 am

I'd wanted a tcp rtt plot for the 4up test also. You can recreate my cdf if you like comparing the sfq vs cake vs fq-codel.

Yes sir, here ya go.. The plot thickens! =P

tcp_nup_-_sfq_4up2.png

tcp_nup_-_sfq_4up3.png

kevinb361 · Mon Dec 13, 2021 6:08 am

also, 8, 16, 32 with SFQ?

Here ya go!

tcp_nup_-_sfq_32up.png

tcp_nup_-_sfq_32up1.png

tcp_nup_-_sfq_32up2.png

tcp_nup_-_sfq_32up3.png

tcp_nup_-_sfq_32up4.png

tcp_nup_-_sfq_32up5.png

kevinb361 · Mon Dec 13, 2021 6:28 am

I hate bugs. :/ Anyway, a packet capture of the 16 flow test would be good at this point.

tcpdump -i your-interface -s 128 -w 16flowscake.cap

We'd never tested bonding until today... and I could imagine us having a lot of packet reordering in a variety of ways.

Assuming this is a bug that isn't in flent - it's one of those darn things that didn't show up in testing because we didn't stress things hard enough. The weird thing is I keep seeing artifacts in the latest release of all this stuff in newer kernels, that don't match the kinds of results we were getting when we first mainlined this code. https://forum.openwrt.org/t/validating- ... /111123/10
is one example.

Can't even rule out a bug in your host's tcp. WI have a research group that can take a look at this, try to reproduce.

ANYWAY. At least it doesn't crash and you have consistently low latency, and probably rarely stress out a box this hard. thx so much for being all over this with me!

Yeah, bugs are no fun! The possible packet reordering makes sense because of the interleaving.

I wonder if you are seeing these artifacts due to a lack of a large enough testing pool? I understand the frustrations, I used to do software QA for several years and the developers hated me.

That is a good point, I was thinking about it earlier.. I should spin up a few different VM's and test from them to see if I get the same results. Just to see if it is something with my host computer/tcp stack/kernel, etc.. I am running a custom kernel on this box which is what got me thinking about it.. 'xanmod' kernel.

Definitely no crashing going on, that is great and since it is just me and the ol lady.. yeah I doubt we hit the router hard ever honestly.. there are MAYBE 20 devices total that have access to the internet. Maybe throw 10-15 VM's at any one time on top of that but even still..

None the less, I for whatever reason really dig this stuff. In my mind, why have all this fancy computer stuff if your network is not optimized?

You are very welcome, and thank you for all your work and insight!

Anyhow, here is a link to my google drive with a fresh cake 16up flent test, and packet capture!
https://drive.google.com/drive/folders/ ... sp=sharing

dtaht · Mon Dec 13, 2021 1:27 pm

Could you delete the --step-size portion of your flent command line? Really hoping this is flent and sampling error...

In fq-codel, we have what is now the second largest queue management system in the world, from a standing start of me and eric dumazet at 4AM PDT in may of 2012, admittedly a distant second to FIFO. iOS, OSX are billions, the linux cloud is 100s of millions, container instances, billions, and router and iot instances on ethernet in the 10s of millions, wifi, somewhere between 10s to 100s of millions. Implementations done by others include those apple versions, the ns3 versions, freebsd and openbsd, and ghu knows where else.

This algorithm is designed to be unobtrusive, and on by default, and although we collect good statistics, it's rare anyone posts them, and for example, microtik has no way to collect them. And there are of course hundreds of other network algorithms in play, all evolving, at the same time the hardware is morphing out from underneath it. I remember giving a lecture at the university at modena ( https://www.bufferbloat.net/projects/ce ... ember-2012 ) where someone said that I must be "so proud", and I said, "no, I'm terrified".

For all that, there's maybe, oh, 200? people in the world that understand network congestion control well, and most of those are retired or nearly so. I certainly wish there were more, because in the end we're kind of responsible for keeping the entire internet from collapsing. The job doesn't pay well, either.

Most of the ongoing validation of correctness has come from thousands of users on dozens of platforms happy with the results from waveform or dslreports, which is kind of limited compared to the larger flent test suite, and certainly I have no budget for hardware, a full tilt lab, or extensive automation, and the problem space has a few hundred dimensions in the end.

Recently I got a small grant, in part, to validate we and the other implementations still got it right: https://nlnet.nl/project/CeroWRT-II/

dtaht · Mon Dec 13, 2021 1:32 pm

your also degrading over flows sfq results are perversely cheering me up. flent bug. tcp bug. me, not mentall concieving how a 19mbit bonded uplink "should work". the packet caps will tell. but it's 3am here, going to back to bed, thx for testing sfq. Also I would consider the xianmod kernel highly experimental, and if you have a more common distro kernel to test with on another vm or on bare metal that would help rule out the host. bbrv2 is *highly* experimental and also modifies the stack in some subtle ways.

dtaht · Mon Dec 13, 2021 4:04 pm

I'm glad you are digging it, and I can feed off your energy somewhat.

for analyzing packet captures I use wireshark a lot, especially looking for retransmits, reorders, and the various plots....

I often use tcptrace and xplot.org - apt-get install tcptrace xplot.org

Example of use

tcptrace -G thecapture.cap

look for a big one in the format xtoy or ytox. xplot that tsv or rtt . xpl file.

Given the rtt variance of the fq-codel ecn test (please do a cap), which should NOT have been that bad... I'm concerned that there might even be something as low level as a crc error here. Your capture wouldn't show that (just losing packets), but IF ecn is enabled, and working, you should be about to see an ece in the ack data (from an upload), or a CE on a download.

I have meetings much of today, might not get on it. Edit: i did. no crc errors.. but..

dtaht · Mon Dec 13, 2021 4:29 pm

wow. You don't have enough loss on that link, only a couple retransmits to speak of, and I'm leaning towards an issue with your host tcp. At one level, it's great, but extremely, extremely weird. are you using the "fq" qdisc on your host, also? And sure you are using cubic?

throughput.png

These RTTs strongly imply something other than the window is regulating the flow. Note that wireshark is far more accurate than flent (which can only sample), so it missed the really big RTTs.

rtt.png

kevinb361 · Tue Dec 14, 2021 1:15 am

Whew, it has been a long day. First thing this morning, I figured.. to heck with that kernel.. well I blew the box up, wouldn't boot into X.. so anyhow, decided to just blow it away and start with a clean install instead of futzing with Linux. I needed to get to work

Other than that, whew I havn't been able to sit down and do any testing until now.

WIth that said, I realized when I changed bbr to cubic the other day, I didn't restart the machine. So I was probably still testing with bbr. *slaps forehead* So I am not sure if it was that, or that dang kernel. None the less, I am on a fresh install with the following:

5.15.5-76051505-generic
net.ipv4.tcp_congestion_control = cubic

I went ahead and ran a test with cake, here are some results.. I think we are in a MUCH better starting position now. Or so I hope

I will attach a capture of the traffic as well..

tcp_nup_-_cake_4up_1_stream.png

tcp_nup_-_cake_4up_4_stream.png

tcp_nup_-_cake_4up_8_stream.png

tcp_nup_-_cake_4up_16_stream.png

tcp_nup_-_cake_4up_32_stream.png

Screenshot from 2021-12-13 16-58-27.png

Screenshot from 2021-12-13 17-00-43.png

Also, just for the sake of data.. here is the info on the modem not showing any CRC errors.

Screenshot from 2021-12-13 17-12-41.png

kevinb361 · Tue Dec 14, 2021 1:25 am

My brain is fried.. forgot to add the packet capture..

https://drive.google.com/drive/folders/ ... sp=sharing

dtaht · Tue Dec 14, 2021 1:29 am

That's MUCH more correct looking, thank you!

Next, to see if ecn is working properly, (e.g. the mikrotik marking it correctly, the path not stomping on it) you can run the exact same test series, but with:

sudo sysctl -w net.ipv4.tcp_ecn=1

I use ecn primarily as an AQM debugging tool (given how rarely it's turned on in the field) and for all I know (without seeing the capture) you had it on just now.

EDIT: You didn't.

dtaht · Tue Dec 14, 2021 1:49 am

Thank you for the packet capture. You can, btw, filter out all your other traffic by specifying "host dallas.starlink.taht.net"

This is the correct sort of carnage that cubic does, there's retransmits, dup acks, out of order stuff - strangely comforting after puzzling over that last capture all night!

correct_cubic_carnage.png

The xplot equivalent of this plot is prettier (IMHO), and in either case, if you zoom in, you can see the "sack" blocks in the bottom line, showing loss and recovery.

With ECN enabled you won't see sacks (except when there is actual loss), CE's and CWRs.

PS I'm not too concerned about the performance dropoff at 32 flows in that we are pounding the link flat and loss going up geometrically, however if it returns or gets worse... my concern was some sort of memory leak hurting the mikrotik box, well I had a lot of concerns! Thx so much for the exhaustive testing, again.

kevinb361 · Tue Dec 14, 2021 1:59 am

That's MUCH more correct looking, thank you!

Next, to see if ecn is working properly, you can run the exact same test series, but with:

sudo sysctl -w net.ipv4.tcp_ecn=1

I use ecn primarily as an AQM debugging tool (given how rarely it's turned on in the field) and for all I know (without seeing the capture) you had it on just now.

WOOHOO! =) Glad we are moving in the right direction now, and you are a mind reader, I sure did.. it was set to 2.

I have just set it to 1, and here are the results..

tcp_nup_-_cake_4up.png

tcp_nup_-_cake_8up.png

tcp_nup_-_cake_16up.png

tcp_nup_-_cake_32up.png

tcp_nup_-_cake_32up1.png

Screenshot from 2021-12-13 17-54-58.png

Screenshot from 2021-12-13 17-55-10.png

kevinb361 · Tue Dec 14, 2021 2:04 am

Once again, I forgot the capture for that last round with ECN = 1 HAH! Here ya go

https://drive.google.com/drive/folders/ ... sp=sharing

kevinb361 · Tue Dec 14, 2021 2:11 am

Thank you for the packet capture. You can, btw, filter out all your other traffic by specifying "host dallas.starlink.taht.net"

This is the correct sort of carnage that cubic does, there's retransmits, dup acks, out of order stuff - strangely comforting after puzzling over that last capture all night!

correct_cubic_carnage.png

The xplot equivalent of this plot is prettier (IMHO), and in either case, if you zoom in, you can see the "sack" blocks in the bottom line, showing loss and recovery.

With ECN enabled you won't see sacks (except when there is actual loss), CE's and CWRs.

PS I'm not too concerned about the performance dropoff at 32 flows in that we are pounding the link flat and loss going up geometrically, however if it returns or gets worse... my concern was some sort of memory leak hurting the mikrotik box, well I had a lot of concerns! Thx so much for the exhaustive testing, again.

Ahh yes, filtering the other flows is a good idea! I have used wireshark at an elementary level.. first time using xplot, I must use it and learn it better! Well, both tools for that matter. That is pretty awesome being able to see the sack blocks. So that first trace was with ECN=2 which was out of the box on this install.. and this last run was with ECN=1. I agree with ya on the 32, I just figured I would throw it in there on these runs to see what happens. It is not crushing as bad as before though, so that is cool too.

So you mentioned the cubic carnage, please excuse my ignorance, but I assume that is the best out of any we could use? I take it at least that it better than BBR? Or maybe better said.. 'better' in this particular environment. I know some tools are better than others depending on the use case.

You are very welcome, glad I can be of some help with data at the least since you are educating me in the process!

dtaht · Tue Dec 14, 2021 2:18 am

Nope. We failed to negotiate ecn. (in the packet capture the syn had ecn cwr, the syn/ack didn't, could be the modem, could be failure to read the dscp field properly on the mikrotik, could be my server, will check the server as soon as I remember the password)

But comforting that the result was essentially the same. Leave ecn on (since it fails to negotiate anyway)... Now, that we have a consistent setup?? Could you repeat the SFQ test and the fq-codel test,
same scenarios, if you aren't too wiped out? I'll go check to see some stuff on the server.

tcpdump with the host

Anyway, with that as a baseline, and if you have energy, the download tests would also be good (but let's shy away from anything involving qos dscp for now, so if you really go nuts, rrul_be doesn't test that). I'm going to make some dinner...

dtaht · Tue Dec 14, 2021 2:31 am

In order for me to look at that machine (ecn neg might be disabled) I will need to shut it down and put a new password on it. Anyway, if yer still testing, let me know when done.

kevinb361 · Tue Dec 14, 2021 2:34 am

Go ahead and do what you need to do. I need to go eat dinner myself. I will check back here before any further testing!

dtaht · Tue Dec 14, 2021 2:49 am

To summarize a few things. Yesterday we ended up in a state where a bunch of flows weren't even going through the host at the right rate, so we weren't stress testing the qdisc, and thus not seeing any difference in latency between the three different qdiscs under test. It was seeing SFQ act the same as all the other ones as we added load, that made me scratch my head - and you blow your machine away entirely! thx. It felt pretty good to me, too.

Anway... I'm sitting here overfocused on making sure the mikrotik is working right, and whilst I am VERY intereted in captures and BBRv1 and BBRv2 behavior, in the context of this thread I just want to make sure mikrotik has got these new qdiscs working exactly right. My long term goal is that fq-codel in particular, go on by default on all interfaces in this or some future mikrotik release...

"So you mentioned the cubic carnage, please excuse my ignorance, but I assume that is the best out of any we could use? I take it at least that it better than BBR? Or maybe better said.. 'better' in this particular environment. I know some tools are better than others depending on the use case."

I'm enjoying very much sharing my tcp knowledge with you whilst you test. i might end up giving some reading assignments though...

tcp is designed, in the end, to be able to reliably carry packets via any means or combination of circumstances possible, as per rfc2549, which is a good read.

So when I said "cubic carnage" I was mostly being allerative. I've seen MUCH MUCH worse, and was actually expecting significant episodes of reordering from the bonded link, but didn't. Anyway, by
eyeball that was the correct behavior of cake and cubic together.

As you pound more and more flows through a link, (or you have a shorter and shorter buffer) we start hitting another phase of tcp (slow start and congestion avoidance are what i usally talk about, but i do allude to this in my apnic talk), we lose so many packets that we trigger tail loss and a 250ms RTO ("hello are you still there?"0, which is an even more extreme form of congestion control (it completely resets the tcp window also). This is probably the cause of the ever "long tail" above the 99th percentile of the cdf plot as you add more and more flows. Add 64 flows, 128 flows, eventually flows won't even be able to get started...

This was pretty good: https://blog.apnic.net/2018/03/19/strik ... cillation/

kevinb361 · Tue Dec 14, 2021 2:56 am

To summarize a few things. Yesterday we ended up in a state where a bunch of flows weren't even going through the host at the right rate, so we weren't stress testing the qdisc, and thus not seeing any difference in latency between the three different qdiscs under test. It was seeing SFQ act the same as all the other ones as we added load, that made me scratch my head - and you blow your machine away entirely! thx. It felt pretty good to me, too. Anway... I'm sitting here overfocused on making sure the mikrotik is working right, and whilst I am VERY intereted in captures and BBRv1 and BBRv2 behavior, in the context of this thread I just want to make sure mikrotik has got these new qdiscs working exactly right. My long term goal is that fq-codel in particular, go on by default on all interfaces in this or some future mikrotik release...

"So you mentioned the cubic carnage, please excuse my ignorance, but I assume that is the best out of any we could use? I take it at least that it better than BBR? Or maybe better said.. 'better' in this particular environment. I know some tools are better than others depending on the use case."

I'm enjoying very much sharing my tcp knowledge with you whilst you test. i might end up giving some reading assignments though...

tcp is designed, in the end, to be able to reliably carry packets via any means or combination of circumstances possible, as per rfc2549, which is a good read.

So when I said "cubic carnage" I was mostly being allerative. I've seen MUCH MUCH worse, and was actually expecting significant episodes of reordering from the bonded link, but didn't. Anyway, by
eyeball that was the correct behavior of cake and cubic together.

As you pound more and more flows through a link, (or you have a shorter and shorter buffer) we start hitting another phase of tcp (slow start and congestion avoidance are what i usally talk about, but i do allude to this in my apnic talk), we lose so many packets that we trigger tail loss and a 250ms RTO ("hello are you still there?"0, which is an even more extreme form of congestion control (it completely resets the tcp window also). This is probably the cause of the ever "long tail" above the 99th percentile of the cdf plot as you add more and more flows. Add 64 flows, 128 flows, eventually flows won't even be able to get started...

This was pretty good: https://blog.apnic.net/2018/03/19/strik ... cillation/

I have a decent understanding of TCP, and I understand what you are saying about the tail loss, especially from the video I found the other day with you using the people as packets. I just pulled up the RFC and the other link. Awesome! I have some homework tonight =)

dtaht · Tue Dec 14, 2021 3:22 am

OK, it's back up. ECN neg is enabled (but the bits could be getting washed out on the path, OR I'd disabled it on the previous boot).

To go to your BBR vs cubic question. :lecture mode:

TCP reno was the "internet standard" for a long time. It had a "sawtooth", and an initial window of 2, and couldn't scale past some X mbits

circa 2006-2008 a bunch of things happened - the linux txqueuelen went from 100 to 1000, TSO (up to 42 packets in a single offload), appeared, and window "scaling" started to deploy,
and linux switched to tcp cubic, and wifi added packet aggregation...
the first was just... dumb, the second, a desparate attempt to make tcp saturate a wire better against weak cpus at the time (which it did) and window scaling, to make TCP scale to gbits and beyond, and cubic looked and was faster to grab bandwidth while seemingly doing no harm because of problems 1,2,3 not being well understood yet, and wifi aggregation not at all.

To compound things further, Linux went to IW10 to make the web server folk happy in 2010 ( https://tools.ietf.org/id/draft-gettys- ... ul-00.html ) ... everyone added more buffering to the modems ... failed to understand what bittorrent's real problem was...

and then we started noticing that classic voip and videoconferecing apps like skype, were not working anymore. Enter jim gettys, having his kids yell at him for transfering files to mit. https://gettys.wordpress.com/category/bufferbloat/ and me, in nicaragua, scratching my head as to why my internet radio, which had worked for years, had stopped working: http://the-edge.taht.net/post/Did_Buffe ... Net_Radio/

Anyway there's a lot of ranting between 2011 and 2021 I'll elide. BBR emerged from youtubes struggle to find a way to deliver data reliably whilst not ovebuffering overmuch (fast forward and reverse this helps with), and it's a *perfect* transport for streaming a single tcp session of recorded video like netflix (except they thus far haven't made BBR work well on bsd). BBR is better in many respects than cubic, especially if it is FQed (where it mostly lives in it's delay based regime), but it has some unpleasant modes where it dukes it out with cubic (to win), has trouble competing with itself (ideally where we use a sharded website today with 110+ different connections we'd switch to *one* BBR connection back to the "mainframe", and it doesn't presently respect RFC3168 ECN, or gentle packet loss and has it's own model of the network that is by god, superior to yours and your efforts to police or limit traffic so applications you like, like games, work well not relevant to that world picture of endless advertisements poured past your eyeballs.

https://queue.acm.org/detail.cfm?id=3022184

Despite my cynicism, I don't like cubic either - reno, what was so wrong with reno, and IW2? I ask. Apple just went IW10 too, with offloads, and those are not the right things for clients, either, iMHO... the internet is a communications network, not a tv... I'm really happy that the pending HTTP3 standard specifies reno, as by jumping on udp, where all our request response and voip/videoconferencing protocols reside, and even then most days I think inbound shaping with FQ-codel is the only way to keep all the applications besides web traffic working for everyone.

Anyway, BBRv2 is better than BBRv1, and I'm delighted to see new people trying the shiny stuff in circumstances where the designers didn't think about much, like on a 19Mbit fq-codeled link. And finding bugs. They like packet captures too and sometimes listen to reason. And I'm grouchy and it's time for dinner. I look forward to the rest of the tests!

kevinb361 · Tue Dec 14, 2021 3:39 am

OK, it's back up. ECN neg is enabled (but the bits could be getting washed out on the path, OR I'd disabled it on the previous boot).

To go to your BBR vs cubic question. :lecture mode:

TCP reno was the "internet standard" for a long time. It had a "sawtooth", and an initial window of 2, and couldn't scale past some X mbits

circa 2006-2008 a bunch of things happened - the linux txqueuelen went from 100 to 1000, TSO (up to 42 packets in a single offload), appeared, and window "scaling" started to deploy,
and linux switched to tcp cubic, and wifi added packet aggregation...
the first was just... dumb, the second, a desparate attempt to make tcp saturate a wire better against weak cpus at the time (which it did) and window scaling, to make TCP scale to gbits and beyond, and cubic looked and was faster to grab bandwidth while seemingly doing no harm because of problems 1,2,3 not being well understood yet, and wifi aggregation not at all.

To compound things further, Linux went to IW10 to make the web server folk happy in 2010 ( https://tools.ietf.org/id/draft-gettys- ... ul-00.html ) ... everyone added more buffering to the modems ... failed to understand what bittorrent's real problem was...

and then we started noticing that classic voip and videoconferecing apps like skype, were not working anymore. Enter jim gettys, having his kids yell at him for transfering files to mit. https://gettys.wordpress.com/category/bufferbloat/ and me, in nicaragua, scratching my head as to why my internet radio, which had worked for years, had stopped working: http://the-edge.taht.net/post/Did_Buffe ... Net_Radio/

Anyway there's a lot of ranting between 2011 and 2021 I'll elide. BBR emerged from youtubes struggle to find a way to deliver data reliably whilst not ovebuffering overmuch (fast forward and reverse this helps with), and it's a *perfect* transport for streaming a single tcp session of recorded video like netflix (except they thus far haven't made BBR work well on bsd). BBR is better in many respects than cubic, especially if it is FQed (where it mostly lives in it's delay based regime), but it has some unpleasant modes where it dukes it out with cubic (to win), has trouble competing with itself (ideally where we use a sharded website today with 110+ different connections we'd have *one* BBR connection back to the "mainframe", and it doesn't respect ECN, or gentle packet loss and has it's own model of the network that is by god, superior to yours!!

https://queue.acm.org/detail.cfm?id=3022184
Despite my cynicism, I don't like cubic either - reno, what was so wrong with reno, and IW2? I ask.

Anyway, BBRv2 is better than BBRv1, and I'm delighted to see new people trying the shiny stuff in circumstances where the designers didn't think about much, like on a 19Mbit fq-codeled link. And finding bugs. They like packet captures too.

WOW, thank you for all of that! I have many tabs in my browser to read now =) That is interesting that these other mechanisms work great for big fat single flows.. and not others. As if the dev's never talked to anyone else to see what the end user might do other than just watch videos all day!

Reminds me of being a QA guy.. hence why the dev's always hated me.. because I would find the most ridiculous bugs, but after regression and happy path testing, I would hit it like an end user would. What do you mean this input field that SHOULD only be alphanumeric completely crashes everything when you put @#%&@#% in it?!

My brain buffer is bloated HAH! Like my brain is an old dialup BBS and your posts are like a 1gbit link! OK, enough of the dad jokes I suppose.. back to science! Off to run my tests, will report back in a bit!

kevinb361 · Tue Dec 14, 2021 3:52 am

OK, I am going to post SFQ and FQ_Codel separately so it doesn't get confusing..

Link to packet capture of 16 flows SFQ with ECN: https://drive.google.com/drive/folders/ ... sp=sharing

FQ test coming up in a few

tcp_nup_-_sfq_ecn_total.png

tcp_nup_-_sfq_ecn_4up.png

tcp_nup_-_sfq_ecn_8up.png

tcp_nup_-_sfq_ecn_16up.png

tcp_nup_-_sfq_ecn_32up.png

kevinb361 · Tue Dec 14, 2021 4:01 am

And here are the FQ_Codel with ECN tests and packet capture:

https://drive.google.com/drive/folders/ ... sp=sharing

tcp_nup_-_fq_codel_ecn_totals.png

tcp_nup_-_fq_codel_ecn_4up.png

tcp_nup_-_fq_codel_ecn_8up.png

tcp_nup_-_fq_codel_ecn_16up.png

tcp_nup_-_fq_codel_ecn_32up.png

kevinb361 · Tue Dec 14, 2021 4:41 am

Random thought.. I didn't answer your question earlier.. I do have qdiscs on this machine.. fq_codel in fact.. here is what is there by default..

qdisc mq 0: dev enp11s0 root
qdisc fq_codel 0: dev enp11s0 parent :c limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :b limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :a limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :9 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :8 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :7 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :6 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :5 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64

dtaht · Tue Dec 14, 2021 4:51 am

really large string of wtf moments, there. Can you return to cake? or turn off ecn? or both?

your mq - fq-codel might explain some other things, but not this.

kevinb361 · Tue Dec 14, 2021 5:05 am

really large string of wtf moments, there. Can you return to cake? or turn off ecn? or both?

your mq - fq-codel might explain some other things, but not this.

Roger that, back to cake
net.ipv4.tcp_ecn = 0

Data:
https://drive.google.com/drive/folders/ ... sp=sharing

tcp_nup_-_cake_no_ecn_totals.png

tcp_nup_-_cake_no_ecn_4up.png

tcp_nup_-_cake_no_ecn_8up.png

tcp_nup_-_cake_no_ecn_16up.png

tcp_nup_-_cake_no_ecn_32up.png

kevinb361 · Tue Dec 14, 2021 5:08 am

I just had an oh crap moment.. before this thread, I had been been toying around with other stuff, and just remembered I had put this in place.. I would imagine this has had some sort of affect in between client and router....

Firewall mangle rule in mikrotik..

;;; Set priority for WMM
chain=postrouting action=set-priority new-priority=from-dscp-high-3-bits
passthrough=yes log=no log-prefix=""

dtaht · Tue Dec 14, 2021 5:21 am

Well, don't do that then. :O ip is big-endian....

but a good test of fq-codel with ecn disabled would comfort me, first. There should be differences in the overall distribution particularly in the 32 flows test... but throughput should stay flat, not that horrible thing that just happened....

mducharme · Tue Dec 14, 2021 5:33 am

Sorry to enter this thread with a question that is somewhat less relevant to the discussion and testing that is happening currently (which is all very interesting, even though I understand very little of it).

Is there any effort to have hardware ASIC implementations of codel, fq_codel, or cake? Most of the ASIC queues that are hardware offloaded use something called wred, which is weighted-red. I don't know much about it, but I would strongly suspect it is only a slight improvement over regular red. However, it is hardware offloaded by the ASIC, which is a major advantage at the ISP level if you are concerned about CPU utilization and scalability. A lot of major vendors use wred for all queuing. For me, what would be a killer app for these AQM solutions like cake would be if they could be offloaded to an ASIC and to use them with no regular CPU cost instead of being stuck with only wred. Are there any efforts in this area?

kevinb361 · Tue Dec 14, 2021 5:46 am

Well, don't do that then. :O ip is big-endian....

but a good test of fq-codel with ecn disabled would comfort me, first. There should be differences in the overall distribution particularly in the 32 flows test... but throughput should stay flat, not that horrible thing that just happened....

HAH, 20 lashes! *banging head on wall*

OK, here is a CLEAN test!

CAKE with NO ECN and no dumb mangle rule!

https://drive.google.com/drive/folders/ ... sp=sharing

tcp_nup_-_cake_no_ecn_32up_totals.png

tcp_nup_-_cake_no_ecn_4up.png

tcp_nup_-_cake_no_ecn_8up.png

tcp_nup_-_cake_no_ecn_16up.png

tcp_nup_-_cake_no_ecn_32up.png

dtaht · Tue Dec 14, 2021 6:03 am

@mducharme thx for tagging along through this enormous thread. I do hope we prove the 7.1 implementation of these algorithms is solid... mikrotik is very late to this party but can benefit from - for example - all the progress made since docsis-pie was standardized ( https://blog.apnic.net/2021/12/02/worki ... -frontier/ ) and a new improved, worldwide focus on improving network latency in the covid era.

Anyway:

htb + fq-codel has been offloaded into one nic we are publicly aware of: https://forum.openwrt.org/t/validating- ... /111123/24 - fq-codel is also in qcomm's proprietary wifi firmware. I am aware of other efforts but can't talk. I can still vent my opinions, though, like: Offloads are fragile. cpus evolve. http://www.taht.net/~d/broadcom_aug9_2018.pdf - get a bunch of cores with decent
cache and no proprietary offloads and see how you do....

It's htb shaping to a non-default line rate that is the expensive part. So I'd like that be the last thing folk were doing rather than the first. :/

As fq-codel is *cheap* in combination with "bql" backpressure is the default across most of linux now, including openwrt, and with "aql" on the wifi. Any place you have a hw to slow transition (like 10gige to 1gige), I'd like to see it on ( https://datatracker.ietf.org/doc/rfc7567/ )

My recommendation has generally been do programmable bql pressure (some intel and mellonox nics do this), or shape how cake does as htb designs get increasingly bursty at modern rates.

There's a p4 version that might make it into a bigfoot derived card and already works on switchs. There's the ebpf stuff preseem does and this middlebox https://github.com/rchac/LibreQoS

I do hope for more hardware, line cards, that can programatically do this right and meet an isp's needs, but as much as I think quality queue management should be top priority, it's still early days from invention to this much deployment. It would be great if it became a killer app, and whoever did it, threw the bufferbloat project some stock, in exchange for QA....

Lastly, I do not know how much wred is deployed anymore. 5 tuple FQ - all by itself - seems to be gaining traction.

dtaht · Tue Dec 14, 2021 6:05 am

ok, so if you could don a fire retardant suit, re-enable ecn, and retry cake, and if that looks substantially similar, retry fq-codel?

dtaht · Tue Dec 14, 2021 6:08 am

p4 codel: https://arxiv.org/pdf/2010.04528.pdf

everyone else working on hardware implementations, kind of went dark earlier this year, and stopped returning my emails, I like to think that's a good sign.

kevinb361 · Tue Dec 14, 2021 6:24 am

ok, so if you could don a fire retardant suit, re-enable ecn, and retry cake, and if that looks substantially similar, retry fq-codel?

AHAHAH roger that.. here we go! Umm.. well, the results are different.. o.O

CAKE with ECN=1

Data: https://drive.google.com/drive/folders/ ... sp=sharing

tcp_nup_-_cake_ecn_totals.png

tcp_nup_-_cake_ecn_4up.png

tcp_nup_-_cake_ecn_8up.png

tcp_nup_-_cake_ecn_16up.png

tcp_nup_-_cake_ecn_32up.png

dtaht · Tue Dec 14, 2021 6:30 am

@kevinb361 I was up very late yesterday and will sleep soon. I can live with not knowing ecn works before I wake.

thx again for going to town on this and making such "interesting" mistakes. It's all data to me, and I think the bug you had on the xanwhatever itwas kernel was rather interesting, as well as the damage seemingly caused by using that iptables rule.

dtaht · Tue Dec 14, 2021 6:34 am

@kevinb361

the ecn result is very disturbing. But it could be mikrotik (a checksum failure or parsing the wrong bits on this encapsulation, which was a bug that I can't remember when we fixed in some release of linux and cake), the modem, the path, something at linode, where my server is. Anyway, fq-codel without ecn would be a good comparison to validate fq-codel is also implemented correctly, and a repeat of the SFQ test without that iptables rule would hopefully also be sane.

kevinb361 · Tue Dec 14, 2021 6:36 am

@kevinb361 I was up very late yesterday and will sleep soon. I can live with not knowing ecn works before I wake. thx again for going to town on this and making such "interesting" mistakes. It's all data to me, and I think the bug you had on the xanwhatever itwas kernel was rather interesting, as well as the damage seemingly caused by using that iptables rule.

Right on, I am about to pass out myself. I just uploaded the CAKE+ECN data in the post before this one.. look at it in the morning, so you can get some real sleep! =P HAH, I am just glad today is over with and all those weird bugs I introduced on my own are gone! HAHA

Chat with ya later!

mducharme · Tue Dec 14, 2021 6:45 am

Lastly, I do not know how much wred is deployed anymore. 5 tuple FQ - all by itself - seems to be gaining traction.

From my interactions with engineers working for much bigger ISPs than the one I work for (where queuing in software is possible), wred is still the gold standard for most large providers. My understanding is that everything Cisco and Juniper is wred. It can handle huge bandwidth amounts due to offloading to the ASIC, but is almost certainly much worse than any of the newer AQM solutions. I believe those running Cisco and Juniper have no ability to even consider codel or fq_codel or cake on the service provider side.

I suppose the difference is how much these AQM technologies are really designed to be used on the client side vs the service provider side. If I was daring enough to upgrade our core routers to 7.1 (I am not), what would make the difference - should we do cake queues for each customer at the ISP end, or should we deploy local cake queues to their routers, or should we do it on both ends? And if we should do at both ends, then why?

I ask this because most of the documentation I have seen about cake and fq_codel is about what the client should do rather than what the ISP should do. There is a lot of useful information geared towards the regular end user, but very little geared towards ISPs who want to improve services for their users.

kevinb361 · Tue Dec 14, 2021 6:49 am

@kevinb361

the ecn result is very disturbing. But it could be mikrotik (a checksum failure or parsing the wrong bits on this encapsulation, which was a bug that I can't remember when we fixed in some release of linux and cake), the modem, the path, something at linode, where my server is. Anyway, fq-codel without ecn would be a good comparison to validate fq-codel is also implemented correctly, and a repeat of the SFQ test without that iptables rule would hopefully also be sane.

No rest for the wicked! HAHA

OK here is fq_codel without ECN

Data: https://drive.google.com/drive/folders/ ... sp=sharing

tcp_nup_-_fqcodel_no_ecn_totals.png

tcp_nup_-_fqcodel_no_ecn_4up.png

tcp_nup_-_fqcodel_no_ecn_8up.png

tcp_nup_-_fqcodel_no_ecn_16up.png

tcp_nup_-_fqcodel_no_ecn_32up.png

kevinb361 · Tue Dec 14, 2021 7:00 am

@kevinb361

the ecn result is very disturbing. But it could be mikrotik (a checksum failure or parsing the wrong bits on this encapsulation, which was a bug that I can't remember when we fixed in some release of linux and cake), the modem, the path, something at linode, where my server is. Anyway, fq-codel without ecn would be a good comparison to validate fq-codel is also implemented correctly, and a repeat of the SFQ test without that iptables rule would hopefully also be sane.

Sometimes I wonder if I am a masochist.. HAH ok, last test and then I am gonna go count packets until I pass out! =P

this is sfq without ECN

Data: https://drive.google.com/drive/folders/ ... sp=sharing

tcp_nup_-_sfq_no_ecn_totals.png

tcp_nup_-_sfq_no_ecn_4up.png

tcp_nup_-_sfq_no_ecn_8up.png

tcp_nup_-_sfq_no_ecn_16up.png

tcp_nup_-_sfq_no_ecn_32up.png

dtaht · Tue Dec 14, 2021 8:04 am

since your eye is now "trained" for a fairly short rtt, try fremont.starlink.taht.net or london,singapore, or sydney .starlink.taht.net

we also have tests for these competing against each other, as in the usual case we are not sending flows to a single server.

SFQ will start to underperform at these longer rtts, and I don't honestly know which of cake or fq-codel will win. SFQ is doing really, really well so far, but i suspect it will go to hell on the rrul_be tests, even on the short rtt to dallas, and the way to test multiple sites is via the -H serverA -H serverB -H serverC -H serverD rtt_fair_var , which is also "interesting" on a fifo.

And with that, I really am calling it quits for the day. Very reassuring to see non-ecn work. Some backstory - ECN is not an enabled option for any but a few OSX things, or really advanced linux users, so making it misbehave, on this kernel release (and modem! can't rrul that out) coherently and consistently, rules out a ton of mild "background noise" I've had for about a year now.

dtaht · Tue Dec 14, 2021 4:54 pm

The FQ component of fq_codel, cake, and fq-pie has what we call the "sparse flow optimization". Request/response (DNS, syn, syn/ack) the first packet of any new flow, acks, voip, gaming, packets, usually "fly through" without observing any queuing at all. In this example we have 32 fat flows, and SFQ would have put the thin flow at the end of that queue - (which is still a LOT better than FIFO and I'd like to use one of those runs on future plots). so in this example, at this 19mbit rate and number of flows, we're consistently saving 3ms of latency and jitter.

consistently_ll.png

While that might seem like a small number, your typical web page might issue 100 dns queries, and 100 syns, and the queuing cost for those, vanishes. Some of that gets amortized by how web pages interleave requests, but not all of it, by far.

Also, because these qdiscs judge "sparseness" by bytes (DRR-like, rather than SFQ-like), not packets, and because the uplink acks are pretty small and sparse also, the queuing cost for much of a web page load time (usually the first 10 round trips per flow) also vanishes. We used to do a demo back in 2013 or so, showing a basic upload workload and how much better web pages behaved with fq-codel in place. (setting up a long saturating workload in flent -l 300 rrul_be - and then a web page benchmarker, demo'd to dan york of the internet society here:

https://circleid.com/posts/20130418_buf ... s_can_be/
To be clear, however, a great deal of the benefit in that particular demo, was in also effectively applying AQM in shortening the queues, and not having that giant fifo. Enormous single queued FIFOs must die I thought then, and now, and the benefits of rfc8290 so obvious that we'd be done in a year.

kevinb361 · Tue Dec 14, 2021 6:07 pm

since your eye is now "trained" for a fairly short rtt, try fremont.starlink.taht.net or london,singapore, or sydney .starlink.taht.net

we also have tests for these competing against each other, as in the usual case we are not sending flows to a single server.

SFQ will start to underperform at these longer rtts, and I don't honestly know which of cake or fq-codel will win. SFQ is doing really, really well so far, but i suspect it will go to hell on the rrul_be tests, even on the short rtt to dallas, and the way to test multiple sites is via the -H serverA -H serverB -H serverC -H serverD rtt_fair_var , which is also "interesting" on a fifo.

And with that, I really am calling it quits for the day. Very reassuring to see non-ecn work. Some backstory - ECN is not an enabled option for any but a few OSX things, or really advanced linux users, so making it misbehave, on this kernel release (and modem! can't rrul that out) coherently and consistently, rules out a ton of mild "background noise" I've had for about a year now.

OK, going to try and break this down in chunks as best as possible.

This round is rtt_fair_var on cake with dallas, fremont, london, singapore, and sydney

Data: https://drive.google.com/drive/folders/ ... sp=sharing

rtt_fair_var_-_cake_fair_totals.png

rtt_fair_var_-_cake_fair_total_cdf.png

rtt_fair_var_-_cake_fair_4.png

rtt_fair_var_-_cake_fair_8.png

rtt_fair_var_-_cake_fair_16.png

rtt_fair_var_-_cake_fair_32.png

kevinb361 · Tue Dec 14, 2021 6:22 pm

The FQ component of fq_codel, cake, and fq-pie has what we call the "sparse flow optimization". Request/response (DNS, syn, syn/ack) the first packet of any new flow, acks, voip, gaming, packets, usually "fly through" without observing any queuing at all. In this example we have 32 fat flows, and SFQ would have put the thin flow at the end of that queue - (which is still a LOT better than FIFO and I'd like to use one of those runs on future plots). so in this example, at this 19mbit rate and number of flows, we're consistently saving 3ms of latency and jitter.

consistently_ll.png

While that might seem like a small number, your typical web page might issue 100 dns queries, and 100 syns, and the queuing cost for those, vanishes. Some of that gets amortized by how web pages interleave requests, but not all of it, by far.

Also, because these qdiscs judge "sparseness" by bytes (DRR-like, rather than SFQ-like), not packets, and because the uplink acks are pretty small and sparse also, the queuing cost for much of a web page load time (usually the first 10 round trips per flow) also vanishes. We used to do a demo back in 2013 or so, showing a basic upload workload and how much better web pages behaved with fq-codel in place. (setting up a long saturating workload in flent -l 300 rrul_be - and then a web page benchmarker, demo'd to dan york of the internet society here:

https://circleid.com/posts/20130418_buf ... s_can_be/
To be clear, however, a great deal of the benefit in that particular demo, was in also effectively applying AQM in shortening the queues, and not having that giant fifo. Enormous single queued FIFOs must die I thought then, and now, and the benefits of rfc8290 so obvious that we'd be done in a year.

This reminds me of many years ago when I first got into messing with this stuff. I would have set queue's I believe using RED? I don't remember anymore.. but anyhow, I would give ACK and DNS top priority with a guaranteed bucket size of whatever. I wish I had kept those configs so I could look back and see how I used to do it. Thank you for the knowledge, the pieces are starting to make more sense now. I have a ridiculous number of tabs open now, and am slowly going through them reading

That is a great video, and funny no lie.. the other day I took a screencap with OBS of me just opening tabs and going to different sites after clearing my browser cache and DNS cache to show my son and brother how 'snappy' my internet is.

NOTE here for others, look into pihole for your DNS. Local DNS caching is a huge plus especially like Dave said with typical websites now a days resolving so many domains per website. Plus, it is an excellent network wide ad blocker! I currently run two VM's with pihole, as well as a separate unbound recursive DNS server that they point to.

I have since setup whatever ubiquiti's default simple queue is on their USG.. I believe it is fq_codel? He is amazed, and his facetime video is super clear. He is on a 25/5 cable modem so that made an enormous improvement for them. As he said, I can stream netflix and game at the same time now! hehe

My brother on the other hand, we just put his new router in yesterday.. I am still not totally sure what his bandwidth is provisioned at. We only had a few minutes to play with cake but I believe I got him in a decent ballpark to start. He is around 400/40. His speeds fluctuate greatly, even when setting bandwidth. Now granted, that was all testing with web tools.. so I asked him to spin me up a VM on his server over there so I can run flent..

kevinb361 · Tue Dec 14, 2021 6:34 pm

since your eye is now "trained" for a fairly short rtt, try fremont.starlink.taht.net or london,singapore, or sydney .starlink.taht.net

we also have tests for these competing against each other, as in the usual case we are not sending flows to a single server.

SFQ will start to underperform at these longer rtts, and I don't honestly know which of cake or fq-codel will win. SFQ is doing really, really well so far, but i suspect it will go to hell on the rrul_be tests, even on the short rtt to dallas, and the way to test multiple sites is via the -H serverA -H serverB -H serverC -H serverD rtt_fair_var , which is also "interesting" on a fifo.

And with that, I really am calling it quits for the day. Very reassuring to see non-ecn work. Some backstory - ECN is not an enabled option for any but a few OSX things, or really advanced linux users, so making it misbehave, on this kernel release (and modem! can't rrul that out) coherently and consistently, rules out a ton of mild "background noise" I've had for about a year now.

Here is the next one.. be back in a bit to do fq_codel gotta jump on a call for a few

SFQ rtt_fair

Data: https://drive.google.com/drive/folders/ ... sp=sharing

rtt_fair_var_-_sfq_fair_totals.png

rtt_fair_var_-_sfq_fair_cdf_total.png

rtt_fair_var_-_sfq_fair_4.png

rtt_fair_var_-_sfq_fair_8.png

rtt_fair_var_-_sfq_fair_16.png

rtt_fair_var_-_sfq_fair_32.png

dtaht · Tue Dec 14, 2021 8:59 pm

To restore your eyeball to what the current "real world" looks like for everyone else, try that rtt_fair test with all this fancy schmancy stuff off, just the default fifo on the modem. You situation is different than that 2013 demo in that you have a vastly shorter queue than the 250+ms queue of the cable modems of the time, and the linux tcp stack has also improved greatly (with packet pacing)....

another visual trick is putting those sites in your hosts file so you can just say -H sydney -H singapore etc on the command line instead of sydney.starlink.taht.net so it's more readable.

I should also note that the "starlink" subdomain is just the name of the linux 5.11 kernel cloud I'd created to test starlink stuff, and has nothing to do with starlink (with whom I have a non-relationship presently - amusing story of my encounter with them here: https://www.youtube.com/watch?v=c9gLo6Xrwgw starlink data here: https://docs.google.com/document/d/1puR ... QKblM/edit ). I hope they fix the dishy at some point, and their router...

I have an older cloud named "apple", and an even older one, named "comcast", and I keep them running primarily so I can verify changes in host device drivers and tcp stacks over time.

kevinb361 · Wed Dec 15, 2021 12:32 am

To restore your eyeball to what the current "real world" looks like for everyone else, try that rtt_fair test with all this fancy schmancy stuff off, just the default fifo on the modem. You situation is different than that 2013 demo in that you have a vastly shorter queue than the 250+ms queue of the cable modems of the time, and the linux tcp stack has also improved greatly (with packet pacing)....

another visual trick is putting those sites in your hosts file so you can just say -H sydney -H singapore etc on the command line instead of sydney.starlink.taht.net so it's more readable.

I should also note that the "starlink" subdomain is just the name of the linux 5.11 kernel cloud I'd created to test starlink stuff, and has nothing to do with starlink (with whom I have a non-relationship presently - amusing story of my encounter with them here: https://www.youtube.com/watch?v=c9gLo6Xrwgw starlink data here: https://docs.google.com/document/d/1puR ... QKblM/edit ). I hope they fix the dishy at some point, and their router...

I have an older cloud named "apple", and an even older one, named "comcast", and I keep them running primarily so I can verify changes in host device drivers and tcp stacks over time.

OK, finally at it.. been another busy day but so far have made some great progress with cake on my brothers cable modem! 400/40 is what it's real speed appears to be. Also got wireguard setup between us so that I can SSH into a VM on his end to do my testing and rsync the data here to analyze. Will do more later now that the wife and kids are there screwing up my data with all their streaming

Anyhow... here is a test with no queue, whatever the modem is doing..

Data: https://drive.google.com/drive/folders/ ... sp=sharing

rtt_fair_var_-_no_queue_rtt_fair_cdftotals.png

rtt_fair_var_-_no_queue_rtt_fair_totals.png

rtt_fair_var_-_no_queue_rtt_fair_4.png

rtt_fair_var_-_no_queue_rtt_fair_8.png

rtt_fair_var_-_no_queue_rtt_fair_16.png

rtt_fair_var_-_no_queue_rtt_fair_32.png

dtaht · Wed Dec 15, 2021 12:35 am

@mducharme I forked your question over here, so it doesn't get lost. viewtopic.php?t=181289

I'm a little busy today, I'll try to get back on it tonight or tomorrow.

kevinb361 · Wed Dec 15, 2021 12:46 am

here is the final rtt_fair_var with fq_codel

Data: https://drive.google.com/drive/folders/ ... sp=sharing

rtt_fair_var_-_fqcodel_rtt_fair_totals.png

rtt_fair_var_-_fqcodel_rtt_fair_cdftotal.png

rtt_fair_var_-_fqcodel_rtt_fair_4.png

rtt_fair_var_-_fqcodel_rtt_fair_8.png

rtt_fair_var_-_fqcodel_rtt_fair_16.png

rtt_fair_var_-_fqcodel_rtt_fair_32.png

dtaht · Wed Dec 15, 2021 12:54 am

I have since setup whatever ubiquiti's default simple queue is on their USG.. I believe it is fq_codel? He is amazed, and his facetime video is super clear. He is on a 25/5 cable modem so that made an enormous improvement for them. As he said, I can stream netflix and game at the same time now! hehe

My brother on the other hand, we just put his new router in yesterday.. I am still not totally sure what his bandwidth is provisioned at. We only had a few minutes to play with cake but I believe I got him in a decent ballpark to start. He is around 400/40. His speeds fluctuate greatly, even when setting bandwidth. Now granted, that was all testing with web tools.. so I asked him to spin me up a VM on his server over there so I can run flent..

I ported fq_codel to the edgerouters over a weekend ( https://gettys.wordpress.com/2017/02/02 ... fferbloat/ ). Their userbase lept all over it, wrote the backend configuration language, the gui, and a wizard, then ubnt ultimately adopted in their next version of the OS, calling it "smart queues", in reference to the "smart queue management" spec. (It's since been renamed to esq)

One nice fq-codel thing is that you can run multiple netflix flows at the same time and have them hold at roughly the same rate and with consistent quality in competition with other traffic.

yes, I don't trust web tests very far. thx for adopting flent.

dtaht · Wed Dec 15, 2021 12:57 am

To verify - presently you have cake 100mbit on the download, and were varying the upload qdisc?

And when you tested "the bare modem" both were off?

kevinb361 · Wed Dec 15, 2021 1:26 am

I have since setup whatever ubiquiti's default simple queue is on their USG.. I believe it is fq_codel? He is amazed, and his facetime video is super clear. He is on a 25/5 cable modem so that made an enormous improvement for them. As he said, I can stream netflix and game at the same time now! hehe

My brother on the other hand, we just put his new router in yesterday.. I am still not totally sure what his bandwidth is provisioned at. We only had a few minutes to play with cake but I believe I got him in a decent ballpark to start. He is around 400/40. His speeds fluctuate greatly, even when setting bandwidth. Now granted, that was all testing with web tools.. so I asked him to spin me up a VM on his server over there so I can run flent..
I ported fq_codel to the edgerouters over a weekend ( https://gettys.wordpress.com/2017/02/02 ... fferbloat/ ). Their userbase lept all over it, wrote the backend configuration language, the gui, and a wizard, then ubnt ultimately adopted in their next version of the OS, calling it "smart queues", in reference to the "smart queue management" spec. (It's since been renamed to esq)

One nice fq-codel thing is that you can run multiple netflix flows at the same time and have them hold at roughly the same rate and with consistent quality in competition with other traffic.

yes, I don't trust web tests very far. thx for adopting flent.

Oh wow! That is awesome! I have an old edgerouter around here somewhere. I think the SD card or whatever in it is corrupt. I found where I can revive it.. but just havn't had the need to. I need to put that on my todo list. I remember seeing somewhere where someone was able to get OpenBSD + pf running on it. That would be pretty neat. I love pf.

He ran the edgerouter for a while until it went tits up, then he replaced it with a USG, but they were not beefy enough to get full speed out of it when using queue's. Dad wanted to see the big numbers ugh.. but now he is just using the factory whatever router from his new fiber ISP. I have not touched it.. but I gave the USG to my son, and he is on such a small pipe, the queue doesn't lower the speed and it is running GREAT now!

I don't trust the web sites either.. but trying to explain the web tests to flent to the family is like pulling teeth. My brother is seeing the difference since he finally gave me a VM to use to test. Now if I can just talk my dad into letting me setup a new router

kevinb361 · Wed Dec 15, 2021 1:29 am

To verify - presently you have cake 100mbit on the download, and were varying the upload qdisc?

And when you tested "the bare modem" both were off?

cake 100 on download, and NOT varying the upload, it was set at 19M

both were off when testing the bare modem

Screenshot from 2021-12-14 17-28-39.png

dtaht · Wed Dec 15, 2021 1:34 am

The download component of your test looks a touch odd to me, I asked above what it was set to.

Also the --te=upload_streams parameter has no function on the rtt_fair tests, they generate one stream per -H server option.

Here's where fq-codel begins to pull ahead of SFQ in a couple respects. Your baseline RTT is about 28ms to dallas, and over 250ms to the furthest server on the list. A design goal of TCP was to have it be ultimately (after running for a while) "fair" to flows of vastly different distances, so that you could transfer data from dallas to fremont, and from dallas to sydney, simultaneously and be sure that you'd have at least some throughput at the longer RTTs. This goal, was actually inherent in why IP took over from novell's IPX, because the IPX folk hadn't thought about this hard enough

It is still "just a goal" that is not ever met, but tends to degrade fairly gracefully, as every TCP paper you read will try and express how they might converge more or less fairly, over time at different round trips.

Nowadays, more and more data is moving to the datacenter closest to you, and in the cable case, perhaps you'd be 12ms away from my server, and in the fiber case, 2ms. With a naive design for TCP/ip the odds are good that that "local-ish" traffic would completely starve out longer distances, and indeed it can be quite unfair to more distant flows. 7 or 8x differences in throughput at 10x RTT differences are fairly common.

But! Sydney is quite possibly still a really needed destination for your traffic, so... what do you do? I'm pretty old fashioned in terms of my aims for low latency and equal throughput... and at every point, although we optimized for RTT relentlessly in the design of fq-codel, we also aimed for ultimate that "some" bandwidth that other flows could get, in codel, maybe better than 1/7x, we didn't know...

Now, with really short fifo queues, and with sfq's really short queues, tcp generally cannot get enough runway to send a BDP's worth of traffic to more distant coasts, so you see the short RTT getting 10mbits of uplink bandwidth here:

rtt_fair_var_-_sfq_dl.png

fq-codel, on the other hand, strives to give "enough" buffering for more distant sites to get a much more nearly fair share of the bandwidth.

rtt_fair_var_-_fqcodel_dl.png

The relentless drive to move CDN resources closer and closer to you is a good thing - shorter RTTs make for more responsive web traffic in particular, but my design goal for fq-codel was
to be able to connect equally to all people, near and far, and their services of all sorts, be it email, or chat, or web or voip, regardless of how distant they were.

And we didn't get 1/7th the bandwidth at 10x the RTT,! we knocked it out of the park, with nearly equal throughput no matter how near, or how far. (TCPs improved also).

kevinb361 · Wed Dec 15, 2021 1:50 am

The download component of your test looks a touch odd to me, I asked above what it was set to.

Also the --te=upload_streams parameter has no function on the rtt_fair tests, they generate one stream per -H server option.

Here's where fq-codel begins to pull ahead of SFQ in a couple respects. Your baseline RTT is about 28ms to dallas, and over 250ms to the furthest server on the list. A design goal of TCP was to have it be ultimately (after running for a while) "fair" to flows of vastly different distances, so that you could transfer data from dallas to fremont, and from dallas to sydney, simultaneously and be sure that you'd have at least some throughput at the longer RTTs. This goal, was actually inherent in why IP took over from novell's IPX, because the IPX folk hadn't thought about this hard enough

It is still "just a goal" that is not ever met, but tends to degrade fairly gracefully, as every TCP paper you read will try and express how they might converge more or less fairly, over time at different round trips.

Nowadays, more and more data is moving to the datacenter closest to you, and in the cable case, perhaps you'd be 12ms away from my server, and in the fiber case, 2ms. With a naive
design for TCP/ip the odds are good that that "local-ish" traffic would completely starve out longer distances, and indeed it can be quite unfair to more distant flows. 7 or 8x differences in throughput at 10x RTT differences are fairly common.

But! Sydney is quite possibly still a really needed destination for your traffic, so... what do you do? I'm pretty old fashioned in terms of my aims for low latency and equal throughput... and at every point, although we optimized for RTT relentlessly in the design of fq-codel, we also aimed for ultimate that "some" bandwidth that other flows could get, in codel, maybe better than 1/7x, we didn't know...

Now, with really short fifo queues, and with sfq's really short queues, tcp generally cannot get enough runway to send a BDP's worth of traffic to more distant coasts, so you see the short RTT getting 10mbits of uplink bandwidth here:

rtt_fair_var_-_sfq_dl.png

fq-codel, on the other hand, strives to give "enough" buffering for more distant sites to get a much more nearly fair share of the bandwidth.

rtt_fair_var_-_fqcodel_dl.png

The relentless drive to move CDN resources closer and closer to you is a good thing - shorter RTTs make for more responsive web traffic in particular, but my design goal for fq-codel was
to be able to connect equally to all people, near and far, and their services of all sorts, be it email, or chat, or web or voip, regardless of how distant they were.

And we didn't get 1/7th at 10x the RTT,! we knocked it out of the park, with nearly equal throughput no matter how near, or how far. (TCPs improved also).

Outstanding! It is obvious that fq_codel is working as designed! I can agree with you on the 'old fashined' way of thinking. For example, I am a linux sysadmin by trade and I administer a few 100 servers spread across the US. I could see where this would be a relevant argument in the sense that OK, across town I could get 10ms RTT, but in Washington it could be say 200ms RTT... and I would not be happy if I am dropping ssh traffic to washington because I am pushing alot of data to server across town.

dtaht · Wed Dec 15, 2021 4:25 am

cake on the edgerouter: https://community.ui.com/questions/Cake ... c755cae8a2
cake on the udm pro: https://github.com/fabianishere/udm-kernel

The whole bufferbloat project is full of hackers desperate to have low latency bandwidth and willing to go to extraordinary lengths to get better queue management running. If routerOS had had a devkit available.... :/

Since your brother is up and running, could you try the upload string of fq_codel'd tests on, with ecn enabled? That would rule out parts of that path, and my server, at least.

I think the device he has not capable of much more than 200Mbit inbound shaping, but could be wrong. The udm pro can do about 700. Also, usually I just reflash most ubnt gear to openwrt. The edgerouter X's are nice little boxes in particular, and they seem to have mostly abandoned edgeOS. VyOS is still alive and has long had smart queues in it. I have reflashed much mikrotik gear as well, but I actually rather like routerOS, and have merely been wishing for 6+ years that they'd get the 300 lines of code that fq_codel is, into it and on by default.

dtaht · Wed Dec 15, 2021 5:11 am

Let me tackle the download portion of the test. :rant: *nobody* for some reason, tests up and downloads and ping simultaneously, as if people just sat there, did an upload, waited, then did a download, and then did a ping. It's a really bothersome aspect of almost all the web tests today. Real traffic, from multiple people and their devices in a household or business is in both directions, all the time. Your network should degrade gracefully when there is traffic up, down, or both at the same time. While the rrul test series is patterned on bittorrent, which once upon a time ruled the world, we stilll didn't test networks for what torrent was really doing to them, in the light of some future world that had way more devices on it, more or less behaving as badly or worse than torrent did. :End of rant: See bofh for more...

Anyway, your provider's network represents a pretty good compromise of packet, not byte limits, on both sides. If you must have a FIFO, Byte fifos are better because acks eat 1/15th the space data does, and so if you have a ton of acks in one direction or another, they crowd out the data packets. Bytes are a rough proxy for time, as it takes the same amount of time to transmit 15 64 byte acks as a 1500 byte data packet. You had about, i don't remember now, 80ms worth of buffering for big packets on the down, and yes, I can do the math for the right packet limit that actually represents with the rrul test results so long as cake's ack-filter is off, pretty accurately, but try to leave that as an exercise for the reader. But anyway, on the down, this time, you have ton of acks from the up, clogging up that queue, and your download is now rate limited to 50Mbits by the upload. (if these packet limits were oversized, your upload would be limited by the download)

noqueue_dl.png

SFQ is pretty similar here, but a bit more biased towards the shorter RTT.

sfq_dn.png

(I'm assuming above you used sfq or noqueue in the inbound shaper0

Please note, that both these behaviors in either case is actually a pretty good thing, in that the user perceptible *latency* is gone, because bytes=time and your download slowed down gracefully, and your up, underbuffered. So... win, right?

Or... you could have a network capable of running at 100Mbit down, 19Mbit up, all the time, with no latency, either:

fqcodel_dl.png

despite this being better, it appears to my eye that you were running out of queue on the down due to the synchronized drops - which could be hitting a limit at the provider or... is there a 1000 packet limit or memory limit? Cake scales this correctly for you on the down, or should. fq-codel we should have ripped out the packet limit long ago....

While this is a good result... 2x better than the default, without that sync'd drop, it too would have ultimately converged nearer to equal bandwidth for all. Cubic is still too aggressive, so it would take a while....

kevinb361 · Wed Dec 15, 2021 5:57 am

cake on the edgerouter: https://community.ui.com/questions/Cake ... c755cae8a2
cake on the udm pro: https://github.com/fabianishere/udm-kernel

The whole bufferbloat project is full of hackers desperate to have low latency bandwidth and willing to go to extraordinary lengths to get better queue management running. If routerOS had had a devkit available.... :/

Since your brother is up and running, could you try the upload string of fq_codel'd tests on, with ecn enabled? That would rule out parts of that path, and my server, at least.

I think the device he has not capable of much more than 200Mbit inbound shaping, but could be wrong. The udm pro can do about 700. Also, usually I just reflash most ubnt gear to openwrt. The edgerouter X's are nice little boxes in particular, and they seem to have mostly abandoned edgeOS. VyOS is still alive and has long had smart queues in it. I have reflashed much mikrotik gear as well, but I actually rather like routerOS, and have merely been wishing for 6+ years that they'd get the 300 lines of code that fq_codel is, into it and on by default.

Luckily my brothers connection is the one with the Mikrotik RB5009, same router as I am currently running here. The USG is at the boys house which I don't have anything setup as of yet to connect to remotely. Hopefully soon.

Some context on this test.. it is a Mikrotik RB5009 on a cable modem which appears to be a ~400/40 speed from my limited testing so far. I am running these test in an Ubuntu 20.04 VM. It has 4gb ram and 4 cores, so should not have any issues for resources. I am ssh'd into it through a wireguard tunnel, which appears to be using very little bandwidth in total.

Currently I have the upload bandwidth set to 42M (lowest RTT times were at this setting earlier) and download is not limited.

Here are the results of fq_codel with ecn enabled doing a tcp_nup test

Data: https://drive.google.com/drive/folders/ ... sp=sharing

tcp_nup_-_fqcodel_ecn_totals.png

tcp_nup_-_fqcodel_ecn_4up.png

tcp_nup_-_fqcodel_ecn_8up.png

tcp_nup_-_fqcodel_ecn_16up.png

tcp_nup_-_fqcodel_ecn_32up.png

kevinb361 · Wed Dec 15, 2021 6:09 am

Let me tackle the download portion of the test. :rant: *nobody* for some reason, tests up and downloads and ping simultaneously, as if people just sat there, did an upload, waited, then did a download, and then did a ping. It's a really bothersome aspect of almost all the web tests today. Real traffic, from multiple people and their devices in a household or business is in both directions, all the time. Your network should degrade gracefully when there is traffic up, down, or both at the same time. While the rrul test series is patterned on bittorrent, which once upon a time ruled the world, we stilll didn't test networks for what torrent was really doing to them, in the light of some future world that had way more devices on it, more or less behaving as badly or worse than torrent did. :End of rant: See bofh for more...

Anyway, your provider's network represents a pretty good compromise of packet, not byte limits, on both sides. If you must have a FIFO, Byte fifos are better because acks eat 1/15th the space data does, and so if you have a ton of acks in one direction or another, they crowd out the data packets. Bytes are a rough proxy for time, as it takes the same amount of time to transmit 15 64 byte acks as a 1500 byte data packet. You had about, i don't remember now, 80ms worth of buffering for big packets on the down, and yes, I can do the math for the right packet limit that actually represents with the rrul test results so long as cake's ack-filter is off, pretty accurately, but try to leave that as an exercise for the reader. But anyway, on the down, this time, you have ton of acks from the up, clogging up that queue, and your download is now rate limited to 50Mbits by the upload. (if these packet limits were oversized, your upload would be limited by the download)

noqueue_dl.png

SFQ is pretty similar here, but a bit more biased towards the shorter RTT.

sfq_dn.png

(I'm assuming above you used sfq or noqueue in the inbound shaper0

Please note, that both these behaviors in either case is actually a pretty good thing, in that the user perceptible *latency* is gone, because bytes=time and your download slowed down gracefully, and your up, underbuffered. So... win, right?

Or... you could have a network capable of running at 100Mbit down, 19Mbit up, all the time, with no latency, either:

fqcodel_dl.png

despite this being better, it appears to my eye that you were running out of queue on the down due to the synchronized drops - which could be hitting a limit at the provider or... is there a 1000 packet limit or memory limit? Cake scales this correctly for you on the down, or should. fq-codel we should have ripped out the packet limit long ago....

While this is a good result... 2x better than the default, without that sync'd drop, it too would have ultimately converged nearer to equal bandwidth for all. Cubic is still too aggressive, so it would take a while....

Ahh bofh! I love it!

Heading over to The Register, havn't been over there in years, always good for a laugh!

I am not aware of a packet limit, but I am sure there is.. but it could be a function of this:

Screenshot from 2021-12-14 22-00-48.png

There is a NAT table limit in the modem.. again, not truly passthrough mode..

This is definitely a hard limit by the way.. I have hit it a few times years ago before I was running my own recursive DNS servers.. I would run this tool to find the lowest latency public DNS servers. Anyone wanting to generate a ton of traffic.. just run the DNS Benchmark! https://www.grc.com/dns/benchmark.htm had to login to the modem and clear out all the sessions

dtaht · Wed Dec 15, 2021 6:59 am

shouldn't be nat related issue.

In wireshark, to verify if ecn was excerted on an upload, filter on

tcp.flags.ecn == 1

yes, the flag is getting set. but

My wireshark does not appear to show ECN properly on the tcptrace tool. That is not looking particularly healthy on my xplot either.

sacks, resets, cwrs, no ces, none of which show up in the wireshark thing, perhaps my arm box's build of xplot is busted rather than the packets? I'd much rather blame my tools that the router.... grump. i want to go to bed.

dtaht · Wed Dec 15, 2021 7:00 am

turn off ecn on your brothers link?

I assume you have 2 or more hardware queues on the vm?

kevinb361 · Wed Dec 15, 2021 7:23 am

turn off ecn on your brothers link?

I assume you have 2 or more hardware queues on the vm?

There is only one software queue on that VM, fq_codel.. but who knows what proxmox might be doing.. I know uplink from that server is 10gbit

I have been running some test now that the link is quiet on their end.. I have changed the bandwith limit from 42 down to 35. Seems to be about the best RTT I can get there.. A little side note.. for whatever reason the results seem to be much happier with fq_codel than with cake! I wonder if it is the settings. Using the same settings other than the framing that I am using on my DSL. Docsis seems to be the nicest, but still not near as clean as with fq_codel. I might have been wrong, I think the down link is way more than 400 after all.. Maybe I just need to go sleep lol

Here it is with ECN turned off

Data: https://drive.google.com/drive/folders/ ... sp=sharing

tcp_nup_-_fqcodel_no_ecn_totals.png

tcp_nup_-_fqcodel_no_ecn_4up.png

tcp_nup_-_fqcodel_no_ecn_8up.png

tcp_nup_-_fqcodel_no_ecn_16up.png

tcp_nup_-_fqcodel_no_ecn_32up.png

dtaht · Wed Dec 15, 2021 7:24 am

How much memory does this router have?

And if there's a way to, say, double the packet and memory limits on the fq_codel rtt_fair test on your home machine maybe those sync'd drops would go away. I didn't see those options in the gui... a lot of people patch down the 10000 packet limit and 32MB limit in fq_codel to something that seems saner (and is, on memory limited routers!), so I don't know what the default is for mikrotik.

How cake autoconfigures here in this scenario may also be wrong if that too shows the sync'd drops on that test. If the gui allows upping the memlimit for that, try 8M in the inbound shaper. (cake has no packet limit) Our reasoning for how we did the defaults for the memlimit option was kind of obtuse and based more on fear of running a router out of memory than getting it exactly correct for inbound.

On outbound, a packet is allocated from an appropriately sized slab, so an ack is 64 bytes + 256 bytes overhead, a data packet rounds up to 2k.

On inbound, they are allocated from a fixed size 2k per packet ring, no matter if it's an ack or not, so you waste quite a lot of memory. We do gso-splitting, which will reallocate a gso packet from up to 42 packets all in a bunch back to the "right" size, but only if gro actually gets packets to split. Openwrt also had a hack also that would start re-slabbing packets when it had memory pressure. So, on a heavy inbound ack workload we might end up using 7x more memory each than ideal, or compensated for correctly by the cake autoconfig for the memlimit.

The ecn problem disturbs me more and more.

I've had a long day, going to bed. Very nice hacking with you these past few days.

dtaht · Wed Dec 15, 2021 7:31 am

Well, that grouped bifurcation shouldn't be happening in that way. fq-codel suffers from the birthday problem where you get a hash collission sqrt(1024), so at 32 flows it's likely you'd see 2 flows colliding and getting different behavior from the rest. Cake uses a 8 way set associatve hash so you don't see that. I am going to go back to a theory that we are not seeing the right offsets into the packet header, thus the hash function is weird, the dscp handling is weird, and the ack-filter is wonky.

Among many other things that have changed since I last looked at this code was linux switched to a sipp hash from a jenkins hash, but I'm more inclined to suspect
an offload, sending stuff from one cpu to another, or something we haven't thunk of yet.

Remember how we started? At least it doesn't crash. And even being OCD in this way, it performs better than what you had before. I have not had a deep dive into this stuff since,
oh, 2017, really. I'm very interested that it hits the field, working the right way, obviously! but I've had it for the day. Have a great one!

kevinb361 · Wed Dec 15, 2021 7:40 am

How much memory does this router have?

And if there's a way to, say, double the packet and memory limits on the fq_codel rtt_fair test on your home machine maybe those sync'd drops would go away. I didn't see those options in the gui... a lot of people patch down the 10000 packet limit and 32MB limit in fq_codel to something that seems saner (and is, on memory limited routers!), so I don't know what the default is for mikrotik.

How cake autoconfigures here in this scenario may also be wrong if that too shows the sync'd drops on that test. If the gui allows upping the memlimit for that, try 8M in the inbound shaper. (cake has no packet limit) Our reasoning for how we did the defaults for the memlimit option was kind of obtuse and based more on fear of running a router out of memory than getting it exactly correct for inbound.

On outbound, a packet is allocated from an appropriately sized slab, so an ack is 64 bytes + 256 bytes overhead, a data packet rounds up to 2k.

On inbound, they are allocated from a fixed size 2k per packet ring, no matter if it's an ack or not, so you waste quite a lot of memory. We do gso-splitting, which will reallocate a gso packet from up to 42 packets all in a bunch back to the "right" size, but only if gro actually gets packets to split. Openwrt also had a hack also that would start re-slabbing packets when it had memory pressure. So, on a heavy inbound ack workload we might end up using 7x more memory each than ideal, or compensated for correctly by the cake autoconfig for the memlimit.

The ecn problem disturbs me more and more.

I've had a long day, going to bed. Very nice hacking with you these past few days.

This router has 1GB of RAM.. I do not see how you can see memory usage either, only CPU usage. There are no memory limits for fq_codel, only for cake. It would be nice if Mikrotik would give access to all available options, as I saw you state at the beginning of this post they do not have the option for gso-splitting either.

Not sure if I even make sense at this point.. going to sleep as well! Thank you again for the education!!

kevinb361 · Wed Dec 15, 2021 7:43 am

Well, that grouped bifurcation shouldn't be happening in that way. fq-codel suffers from the birthday problem where you get a hash collission sqrt(1024), so at 32 flows it's likely you'd see 2 flows colliding and getting different behavior from the rest. Cake uses a 8 way set associatve hash so you don't see that. I am going to go back to a theory that we are not seeing the right offsets into the packet header, thus the hash function is weird, the dscp handling is weird, and the ack-filter is wonky.

Among many other things that have changed since I last looked at this code was linux switched to a sipp hash from a jenkins hash, but I'm more inclined to suspect
an offload, sending stuff from one cpu to another, or something we haven't thunk of yet.

Remember how we started? At least it doesn't crash. And even being OCD in this way, it performs better than what you had before. I have not had a deep dive into this stuff since,
oh, 2017, really. I'm very interested that it hits the field, working the right way, obviously! but I've had it for the day. Have a great one!

Yep, no crashing for sure and honestly for the use case we are splitting hairs at this point. =) The seat of the pants feeling on my internet as well as my brothers is GREAT and SNAPPY!!

dtaht · Wed Dec 15, 2021 8:30 pm

thank you so much for sharing your raw flent.gz files and packet captures. So many things in this world cannot be captured by a single number, a summary plot, and while a cdf might hint at a problem, looking at a system's evolution, over time, is always helpful. The explanation for why we saw this bifurcation:

cdfequiv.png

was that there were two *really major* interruptions in service where only that flow kept going.

cdfscanbemisleading.png

Now, as to what the heck could have caused this, I don't know. I flipped through a couple others, it seems likely this doesn't happen all the time... The packet capture is really messy and I'm no longer sure which cap I'm looking at and I have meetings most of today.

kevinb361 · Wed Dec 15, 2021 9:20 pm

thank you so much for sharing your raw flent.gz files and packet captures. So many things in this world cannot be captured by a single number, a summary plot, and while a cdf might hint at a problem, looking at a system's evolution, over time, is always helpful. The explanation for why we saw this bifurcation:

cdfequiv.png

was that there were two *really major* interruptions in service where only that flow kept going.

cdfscanbemisleading.png

Now, as to what the heck could have caused this, I don't know. I flipped through a couple others, it seems likely this doesn't happen all the time... The packet capture is really messy and I'm no longer sure which cap I'm looking at and I have meetings most of today.

Yeah, I noticed those interruptions or whatever is causing it as well. I am having a heck of a time with this link. I had my brother call spectrum today and verify that his modem is JUST a gateway.. else make sure it is in bridge mode. It is in fact just a gateway. It is a DOCSIS 3.1 modem with a 2.5gb port.. actually sync'd at 2.5 to the router. I forgot the router actually has a 2.5g port.

Anyhow, back on topic.. doing a test on it with no queue at all, it gets ~600 down and ~40 up. What really strikes me as odd is if I use fq_codel or cake, I can only get it to around ~350 tops. I can leave the downstream unlimited, and still never gets past 350. Really odd.

So, with that said, I had him plug his computer in straight to the modem to do a speed test. Granted, it is web based but upload is still ~40 and download is from 720-830 easily twice of what is going through the router.

CPU utilization never goes above 25% which now that I think about it, it is a quad core.. so that would mean it is stressing a single core. HMMM.. maybe that is the limiting factor?

jmszuch1 · Thu Dec 16, 2021 1:11 am

Yeah, I noticed those interruptions or whatever is causing it as well. I am having a heck of a time with this link. I had my brother call spectrum today and verify that his modem is JUST a gateway.. else make sure it is in bridge mode. It is in fact just a gateway. It is a DOCSIS 3.1 modem with a 2.5gb port.. actually sync'd at 2.5 to the router. I forgot the router actually has a 2.5g port.

Anyhow, back on topic.. doing a test on it with no queue at all, it gets ~600 down and ~40 up. What really strikes me as odd is if I use fq_codel or cake, I can only get it to around ~350 tops. I can leave the downstream unlimited, and still never gets past 350. Really odd.

So, with that said, I had him plug his computer in straight to the modem to do a speed test. Granted, it is web based but upload is still ~40 and download is from 720-830 easily twice of what is going through the router.

CPU utilization never goes above 25% which now that I think about it, it is a quad core.. so that would mean it is stressing a single core. HMMM.. maybe that is the limiting factor?

Been following along and don't have too much to add other than it's been very interesting and informative to read this discussion!

Regarding your speed issue, I wonder if it's this issue that was mentioned by another user in a different topic? They have a 5009 and a 2.5Gb modem as well it looks like: viewtopic.php?t=179145#p895221

kevinb361 · Thu Dec 16, 2021 5:02 am

Yeah, I noticed those interruptions or whatever is causing it as well. I am having a heck of a time with this link. I had my brother call spectrum today and verify that his modem is JUST a gateway.. else make sure it is in bridge mode. It is in fact just a gateway. It is a DOCSIS 3.1 modem with a 2.5gb port.. actually sync'd at 2.5 to the router. I forgot the router actually has a 2.5g port.

Anyhow, back on topic.. doing a test on it with no queue at all, it gets ~600 down and ~40 up. What really strikes me as odd is if I use fq_codel or cake, I can only get it to around ~350 tops. I can leave the downstream unlimited, and still never gets past 350. Really odd.

So, with that said, I had him plug his computer in straight to the modem to do a speed test. Granted, it is web based but upload is still ~40 and download is from 720-830 easily twice of what is going through the router.

CPU utilization never goes above 25% which now that I think about it, it is a quad core.. so that would mean it is stressing a single core. HMMM.. maybe that is the limiting factor?
Been following along and don't have too much to add other than it's been very interesting and informative to read this discussion!

Regarding your speed issue, I wonder if it's this issue that was mentioned by another user in a different topic? They have a 5009 and a 2.5Gb modem as well it looks like: viewtopic.php?t=179145#p895221

Thank you for linking that! I was literally just thinking about going to the other computer to login to his router and force it to 1gb. It is interesting that he was using fasttrack as I tried re-enabling that without any change. OK, getting out of the recliner now to go test that out!

mducharme · Thu Dec 16, 2021 5:11 am

It is unfortunately probably quite difficult to debug cake on RouterOS v7 if other bugs are getting in the way. RouterOS v7 works pretty well in the default config for most home users, but there are still lots of bugs that need to be ironed out.

dtaht · Thu Dec 16, 2021 5:57 pm

Without ack filtering it is extremely difficult to achieve full download speeds at a 15x1 ratio of down to up or worse.

Also rx rings need to be properly sized, as docsis is bursty. A rx ring of 256 is too small. Don't know if you can change that.

i wish more folk were taking packet captures of their network behaviors, using test tools like flent, or at least iperf, rather than web traffic. I also wish I still had my lab setup, and a budget to test this stuff. It's not so much debugging "cake" as suspecting there are other problems in the stack, on this model.

dtaht · Thu Dec 16, 2021 7:39 pm

fqcodel_dl.png

despite this being better, it appears to my eye that you were running out of queue on the down due to the synchronized drops - which could be hitting a limit at the provider or... is there a 1000 packet limit or memory limit? Cake scales this correctly for you on the down, or should. fq-codel we should have ripped out the packet limit long ago....

While this is a good result... 2x better than the default, without that sync'd drop, it too would have ultimately converged nearer to equal bandwidth for all.

Anyway, as we stagger forward on the less-buggy fronts, a repeat of the rtt_fair test in this scenario would be nice, on handling the down better until the sync'd drops go away. (It still might be having that overall weird interruption of service, too, need more data on that...) fq_codel with increased packet limits and memlimit as one thought, cake besteffort with a memlimit 8M perhaps. Finding a way to increase the size of the rx ring, as another. Reducing the shaped bandwidth from 100Mbit down to something less....

kevinb361 · Thu Dec 16, 2021 8:20 pm

Been following along and don't have too much to add other than it's been very interesting and informative to read this discussion!

Regarding your speed issue, I wonder if it's this issue that was mentioned by another user in a different topic? They have a 5009 and a 2.5Gb modem as well it looks like: viewtopic.php?t=179145#p895221
Thank you for linking that! I was literally just thinking about going to the other computer to login to his router and force it to 1gb. It is interesting that he was using fasttrack as I tried re-enabling that without any change. OK, getting out of the recliner now to go test that out!

OK, well.. after changing the port speed last night it wouldn't come back up. So I had to wait for my brother to reset it this morning. Long story short, it didn't work right away, I had to bounce the interface a few times but finally it was showing ~800mbit raw through the router. After a lot of testing.. I am starting to wonder if fq_codel and cake are actually crashing with such high speeds and I just don't see it on my slower connection.

Watching the router closer today on his end, again which is 1gb down cable.. two out of the four CPU cores would get about 50% loaded under test, and when reaching above 600mbit everything seemed to drop and then come back. Again, I am testing through a VPN on a VM on his side. This was when I was running a bandwidth limit on up and down.

To keep it from causing this behaviour, I had to stick with a bandwidth limit with fq_codel ONLY for it to shape at all without this 'drop out'. If I set a bandwidth limit AT ALL even full 1gig, it would happen. If I even used cake at all even without a download bandwidth limit, it would act the same.

I am not convinced that this is a problem with fq_codel or cake. I have a feeling there is something funky with RouterOS.. but this is just a hunch. Hopefully in the future they will add in the availability to at least see the queue stats for these. It is humming along now with a moderate bandwidth limit on the upload with fq_codel. It is helping keep the latency under control under load at least!

kevinb361 · Thu Dec 16, 2021 8:21 pm

Without ack filtering it is extremely difficult to achieve full download speeds at a 15x1 ratio of down to up or worse.

Also rx rings need to be properly sized, as docsis is bursty. A rx ring of 256 is too small. Don't know if you can change that.

i wish more folk were taking packet captures of their network behaviors, using test tools like flent, or at least iperf, rather than web traffic. I also wish I still had my lab setup, and a budget to test this stuff. It's not so much debugging "cake" as suspecting there are other problems in the stack, on this model.

Watching the port bandwidth graph this morning while testing my brothers 1gb cable.. you can most definitely see the bursts!! I remember now why I hated my old cable modem and love my DSL now. Way less bandwidth, but so much 'cleaner'

kevinb361 · Thu Dec 16, 2021 8:38 pm

Well, I feel pretty confident with my DSL config now. As a recap, it is a 100/20 DSL which is synced at 110/22 per the DSL modem.

After some playing around this morning, the best compromise to my eyes at seems like it likes to have the upstream set at the sync, 22mb. I can lower that, but only gain 0.5ms less latency. I can live with that *shrug* Now, this is with ALOT of testing in previous days with getting the framing right. As dtaht has stated before, getting the framing right on DSL is absolutely correct. It gets wild real fast if that isn't right! With that said, the download bandwidth is set at 94.5% of the sync rate.. so 104mbit

So just as a recap of here is a before and after if you will.. before is FIFO and after is cake:

before.png

after.png

kevinb361 · Thu Dec 16, 2021 8:47 pm

fqcodel_dl.png

despite this being better, it appears to my eye that you were running out of queue on the down due to the synchronized drops - which could be hitting a limit at the provider or... is there a 1000 packet limit or memory limit? Cake scales this correctly for you on the down, or should. fq-codel we should have ripped out the packet limit long ago....

While this is a good result... 2x better than the default, without that sync'd drop, it too would have ultimately converged nearer to equal bandwidth for all.
Anyway, as we stagger forward on the less-buggy fronts, a repeat of the rtt_fair test in this scenario would be nice, on handling the down better until the sync'd drops go away. (It still might be having that overall weird interruption of service, too, need more data on that...) fq_codel with increased packet limits and memlimit as one thought, cake besteffort with a memlimit 8M perhaps. Finding a way to increase the size of the rx ring, as another. Reducing the shaped bandwidth from 100Mbit down to something less....

Alright, I gotta get back to work for a bit.. and then come back and figured out where we left off

More tests coming up in a bit!

dtaht · Thu Dec 16, 2021 8:47 pm

looks like perfection to me.

rtt_fair?

kevinb361 · Thu Dec 16, 2021 9:21 pm

looks like perfection to me.

rtt_fair?

OK, work is not bad! WOO! =)

This is with the same settings as last post, with cake.. running rtt_fair_var -- It seems to my untrained eyes it is pretty fair on the upload, but going hog wild on download to dallas!

rtt_fair_var_-_cake_rtt_fair.png

rtt_fair_var_-_cake_rtt_fair-upload.png

rtt_fair_var_-_cake_rtt_fair-download.png

dtaht · Thu Dec 16, 2021 9:40 pm

see how the drops are sync'd on the down? Shouldn't happen. Up the memlimit? or it's the rx-ring. Or gamma radiation from Mars.

kevinb361 · Thu Dec 16, 2021 10:01 pm

see how the drops are sync'd on the down? Shouldn't happen. Up the memlimit? or it's the rx-ring. Or gamma radiation from Mars.

BWHAH gamma radiation! Ahh, I did not notice them sync'd. Interesting!

OK, so looking at the config the Memory Limit is 0 aka default. Reading the documentation, it states:

	
Limit the memory consumed by Cake to LIMIT bytes. By default, the limit is calculated based on the bandwidth and RTT settings.

So if my math is correct, which my math skills are not good.. I am going to assume we need to base this on the largest RTT time. which from that test was 250ms for singapore.

Here are the results.. it looks like the syncing on download is still there a bit, but not near as bad. Should I increase it more? Is there a limit other than hardware that you would not want to increase it past a certain point?

rtt_fair_var_-_cake_rtt_fair.png

rtt_fair_var_-_cake_rtt_fair-upload.png

rtt_fair_var_-_cake_rtt_fair-download.png

250 x 100 = 25,000 bytes

dtaht · Thu Dec 16, 2021 10:10 pm

Moah! Moah! 8x more! You have the memory to burn. (when we developed cake, *32MB* of ram in the router was a lot)

I tried to explain the "default" calculation had some overheads in it that didn't make as much sense on inbound shaping as out. I can try to explain that better....

The default of 100ms RTT for fq_codel and cake was shown to scale well to about 280ms in early testing. If you were in a situation where the majority of your RTT was longer than 100ms (say a geosync satellite, or a tropic island (one of our core testers is based on the island of maritus), then you should change that parameter. No need to change it for living in texas. The symptom of the synchronized drop is due to running out of memlimit queue space, not the codel (cobalt) algorithm, as your fremont link is now approaching parity with the dallas link in that last test, *most likely*. We won't get this perfect, but avoiding the synchronized drop has always been a goal ( https://en.wikipedia.org/wiki/TCP_globa ... ronization ).

I hope in the end we'll patch cake to autoscale a bit more correctly here. (But I also worry about undersized rx rings in a lot of new products, not tested against bursty macs like cable, 802.11ac).

kevinb361 · Thu Dec 16, 2021 10:22 pm

Moah! Moah! 8x more! You have the memory to burn. (when we developed cake, *32MB* of ram in the router was a lot)

I tried to explain the "default" calculation had some overheads in it that didn't make as much sense on inbound shaping as out. I can try to explain that better....

Ahhh, so correct me if I am wrong.. I am a little slow on the uptake sometimes.. post lunch time sleepy.. haha so generically speaking, upping the memory limit is like increasing the ring buffer?

OK, and for some clarity.. that last run, I only set the memory on the ingress, on egress. This time, I set it to 200M on both. But from your last reply, it seems to matter more on the egress, correct?

rtt_fair_var_-_cake_rtt_fair.png

rtt_fair_var_-_cake_rtt_fair-download.png

rtt_fair_var_-_cake_rtt_fair-upload.png

kevinb361 · Thu Dec 16, 2021 10:22 pm

Also, another note.. I see the tails on the CDF plot have also made a huge improvement!

dtaht · Thu Dec 16, 2021 10:27 pm

so this last one had 200MB on ingress? Dang. I gotta point at available queue space at the provider, or a limited rx ring, (or that bug with bursty failures) to explain a failure to improve here.

kevinb361 · Thu Dec 16, 2021 10:39 pm

so this last one had 200MB on ingress? Dang. I gotta point at available queue space at the provider, or a limited rx ring, (or that bug with bursty failures) to explain a failure to improve here.

Yes sir, 200 on both. Well that is a bummer! I was getting excited! haha but honestly.. the internet is so snappy now.. everything just instantly appears.. almost before I click the button on the mouse!

dtaht · Thu Dec 16, 2021 10:42 pm

The behavior of multiple queues in series is kind of complex. Theorists like very much to think about things in terms of a fountain of water, but the real world is batchy in so many respects.

Take packets hitting the rx ring. A batch arrives and the ring was nearly full in the first place. A whole bunch of packets (from all sources) get dropped. The cpu arrives to "clean" the rx ring, never sees that, and then tosses the result into the aqm which then tries to fair queue and intelligently drop if it too is overloaded, hopefully desynchronized drops that "fill in" the spaces within the other competing sawtooths. But they end up pretty synchronized when the rx ring overflows and thus the closest hop retains the most bandwidth, as tcp's defined response to multiple drops within a single RTT is to drop the rate, once. (We are now in TCP/IP 401 classes, rather than my usual 101)

The Cubic tcp algorithm only drops the rate by 30% ( which I've long disagreed with ) and then works towards recovering using a cubic function (which is clever), tcp reno uses 50% and climbs back additively, which means other flows can grab more bandwidth faster, but a reno flow gets less bandwidth. (I think you can tell flent to use another algo via --te=cc_algo=reno,reno or --te=CC=reno,reno but I'd have to re-read the codebase). BBR's methods are very different, as you saw. I don't think I have BBR enabled on all the servers under test, I'd have to check.

This scenario is even worse than that in that the ISP has a buffer at their end, the modem, also, and either one of those unable to absorb a burst will drop packets.

Over the last 20 years, the internet got redesigned for speedtest.net, with everyone testing X flows at a time, up, then, down, then ping, all to the same server.

dtaht · Thu Dec 16, 2021 10:43 pm

I imagine routerOS has no way to see or increase the rx ring? Linux uses "ethtool" to see that.

dtaht · Thu Dec 16, 2021 10:58 pm

squinting, in both cases it did get better for all but the furthest distance (which is kind of expected) If you started the dallas flow last, they'd converge quicker, or if you ran the test longer (-l 300).

So anyway, I'm pretty sure how we calculate the default for inbound shaping to be wrong, some value larger than the default helps, and will try to come up with something better in the future. Thx for exploring that.

For long distances especially, having ECN on helps, (I've said elsewhere on this thread that given its experimental nature, I use it as a debugging tool - any drops on the test are coming from somewhere else on the path) except we seemed to have a problem there, both on your brothers setup and yours. You got a different mikrotik product?

kevinb361 · Thu Dec 16, 2021 11:16 pm

I imagine routerOS has no way to see or increase the rx ring? Linux uses "ethtool" to see that.

OK, it does not appear from digging through the interface or through the documentation that I can change the ring buffer. Now, if I was running a CHR I could.. since it is RouterOS running on top of linux. This might be something for me to check out in the future. I have a spare SFP+ port on my server, I should be able to pass that through to a VM and run CHR.

Anyhow, back on topic.. the closest I could find was here in the docs: https://wiki.mikrotik.com/wiki/Manual:Queue

Specifically this:

only-hardware-queue leaves interface with only hw transmit descriptor ring buffer which acts as a queue in itself. Usually at least 100 packets can be queued for transmit in transmit descriptor ring buffer. Transmit descriptor ring buffer size and the amount of packets that can be queued in it varies for different types of ethernet MACs.

Having no software queue is especially beneficial on SMP systems because it removes the requirement to synchronize access to it from different cpus/cores which is expensive.

multi-queue-ethernet-default can be beneficial on SMP systems with ethernet interfaces that have support for multiple transmit queues and have a linux driver support for multiple transmit queues. By having one software queue for each hardware queue there might be less time spent for synchronizing access to them.

So here is the test again with multi-queue-ethernet selected for ether1, going to the modem

rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue.png

rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue-download.png

rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue-upload.png

kevinb361 · Thu Dec 16, 2021 11:24 pm

The behavior of multiple queues in series is kind of complex. Theorists like very much to think about things in terms of a fountain of water, but the real world is batchy in so many respects.

Take packets hitting the rx ring. A batch arrives and the ring was nearly full in the first place. A whole bunch of packets (from all sources) get dropped. The cpu arrives to "clean" the rx ring, never sees that, and then tosses the result into the aqm which then tries to fair queue and intelligently drop if it too is overloaded, hopefully desynchronized drops that "fill in" the spaces within the other competing sawtooths. But they end up pretty synchronized when the rx ring overflows and thus the closest hop retains the most bandwidth, as tcp's defined response to multiple drops within a single RTT is to drop the rate, once. (We are now in TCP/IP 401 classes, rather than my usual 101)

The Cubic tcp algorithm only drops the rate by 30% ( which I've long disagreed with ) and then works towards recovering using a cubic function (which is clever), tcp reno uses 50% and climbs back additively, which means other flows can grab more bandwidth faster, but a reno flow gets less bandwidth. (I think you can tell flent to use another algo via --te=cc_algo=reno,reno or --te=CC=reno,reno but I'd have to re-read the codebase). BBR's methods are very different, as you saw. I don't think I have BBR enabled on all the servers under test, I'd have to check.

This scenario is even worse than that in that the ISP has a buffer at their end, the modem, also, and either one of those unable to absorb a burst will drop packets.

Over the last 20 years, the internet got redesigned for speedtest.net, with everyone testing X flows at a time, up, then, down, then ping, all to the same server.

Coming back to this post.. I never knew cubic drops 30%, heck I never looked into it.. I always assumed it was 50% I guess when I learned about it at the time, I must have been using reno and reading about that?

I thought I had seen something in the docs about the algo's as flags.. but maybe it was documents for another tool.. I just looked through the man page and dont see anything

kevinb361 · Thu Dec 16, 2021 11:31 pm

OK, I forgot to mention.. this morning I had done some testing with ECN again.. with it set to 1 on the host, the upload was a big sync'd wave.. crazy looking.. going from 2-4mb in one big wave on the RRUL test

However, with it set to 2, it seemed normal. Again, atleast the RRUL test. I just set it back to 2 again and also set dallas and fremont last.. here are the results..

rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue.png

rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue-download.png

rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue-upload.png

kevinb361 · Thu Dec 16, 2021 11:34 pm

Oh, and I do have another router.. it is a CCR1009. The one I was running at the very beginning before we started doing science. It is a 9 core 1.2Ghz TILE processor with 2GB RAM.

kevinb361 · Fri Dec 17, 2021 12:55 am

QUICK NOTE!! I went back and added memory to my brothers router.. and it runs cake without crashing or whatever it was doing.. WITH bandwidth on ingress AND egress!

My gut feeling now is that the default memory needs to be increased.. I just didn't see it crashing on my end.. maybe because I am at 100 down and his is 1g down?

kevinb361 · Fri Dec 17, 2021 1:17 am

Another note I have noticed throughout my testing on my own DSL, and my brothers cable.. but definently more so on his.. I also see this 'crash' or whatever it is when you set the bandwidth limit TOO LOW on the egress!!

kevinb361 · Fri Dec 17, 2021 1:23 am

Another note.. not sure the interface queue change made any difference for the ring queue, BUT.. it APPEARS that it does in fact allow for better threading across the cores!

dtaht · Fri Dec 17, 2021 1:29 am

don't celebrate too soon. Luck counts, and there still may be an obscure bug...

And you mean cake memlimit or physical memory?

Is the ack-filter on on brother's egress? Again, given my still held doubts on having the offsets right for dscp, ecn, and that, having it on may do bad things, but it's very useful on asymmetric connections if working. https://blog.cerowrt.org/post/ack_filtering/

Lastly you posted nice plots saying "default" when I think you meant the hw multi-queue? It was good to see it converge at t+40. Yes you really do want to spread more load across cores if possible,
so the rings are drained or filled in smaller bursts more often.

I don't know if the doc is out of date or not, but ideally there's a new subsystem called BQL in play now moderating the tx ring: https://lwn.net/Articles/469652/ - bql was the core tech that made it possible to run fq-codel at line rate with very minimal overhead (compared to shaping), works well across cores,

kevinb361 · Fri Dec 17, 2021 1:36 am

don't celebrate too soon. And you mean cake memlimit or physical memory?

Is the ack-filter on on egress? Again, given my still held doubts on having the offsets right for dscp, ecn, and that, having it on may do bad things, but it's very useful on asymmetric connections if working. https://blog.cerowrt.org/post/ack_filtering/

Lastly you posted nice plots saying "default" when I think you meant the hw multi-queue? It was good to see it converge at t+40. Yes you really do want to spread more load across cores if possible.

Oh, definitely not celebrating yet! Just happy to start getting some fairly consistant results and I move knobs one way or the other! Yes, I need to slow down, I meant to say cake memlimit!

Ack filter set to filter on egress. Ack, the hardware-multi-queue as in the following set as the physical 'interface queue'

multi-queue-ethernet-default can be beneficial on SMP systems with ethernet interfaces that have support for multiple transmit queues and have a linux driver support for multiple transmit queues. By having one software queue for each hardware queue there might be less time spent for synchronizing access to them.

I need to look at core usage on his while I test, but I noticed on mine after enabling that, it went from ~50% on two cores to almost three cores at %50 and a little on the fourth

dtaht · Fri Dec 17, 2021 6:34 am

ecn = 0 = do not accept or initiate ecn negotiation
ecn = 1 = accept and initiate ecn neg
ecn = 2 = accept ecn neg, but do no initiate

The default for much of the internet is "2" (except google, which wants to change the definition of ecn entirely).

dtaht · Fri Dec 17, 2021 5:31 pm

I'd appreciate another capture from your brothers box, of rtt_fair, blowing up, with ecn enabled.

also it's easier to look at this stuff in tcptrace/xplot if you just capture those flows.

tcpdump -i the_interface -s 128 -w the_capture host dallas or host sydney or host ...

thx!

kevinb361 · Fri Dec 17, 2021 6:21 pm

I'd appreciate another capture from your brothers box, of rtt_fair, blowing up, with ecn enabled.

also it's easier to look at this stuff in tcptrace/xplot if you just capture those flows.

tcpdump -i the_interface -s 128 -w the_capture host dallas or host sydney or host ...

thx!

Can do! Do you want me to use fq_codel or cake?

dtaht · Fri Dec 17, 2021 6:24 pm

cake. also follow with ecn off without resetting the qdisc, to make sure it's not permanently driven wonky? If it's permanently driven wonky, that's almost a CVE, and I've had enough of those this week.

kevinb361 · Fri Dec 17, 2021 7:09 pm

OK, I had to redo that test because I messed up. Here is the test

rtt_fair_var with cake and ECN on and then off

https://drive.google.com/drive/folders/ ... sp=sharing

dtaht · Fri Dec 17, 2021 7:38 pm

Thx again for helping. Trying to decide on fleeing to mexico or not. Ok, please reload the qdisc(s), leave ecn off, and try again?

Were you seeing these lumps before?

lumps.png

You got no throughput from dallas with ecn.

nodnfromdallas.png

kevinb361 · Fri Dec 17, 2021 8:00 pm

Thx again for helping. Trying to decide on fleeing to mexico or not. Ok, please reload the qdisc(s), leave ecn off, and try again?

Were you seeing these lumps before?

lumps.png

You got no throughput from dallas with ecn.

nodnfromdallas.png

No problem! Mexico would be nice this time of year! I have some friends that live along the border, I should go visit for a BBQ and a few cervesas!

To be honest, the results vary but I would hope that is due to the network not being totally quiet. He also works from home so I do see some lumps here and there and assumed that was why. I might have to run these tests at night after the network is clear of local traffic. I noticed that with dallas. That is really odd! Maybe something between him and dallas and not the others because of different routes?

Data: https://drive.google.com/drive/folders/ ... sp=sharing

rtt_fair_var_-_cake_no_ecn.png

dtaht · Fri Dec 17, 2021 8:12 pm

Well, you shouldn't see that long term growth pattern either. This is after you tuned up the multipath tx/rx thing? What happens with bandwidth down less 20Mbit?

Anyway, thx again. I'm packing up for a trip south, (not to mexico! trying to get closer to the spacex launch), and can't look at this harder today.

When the network is more idle a reboot, putting in--step-size=0.05 -l 300 with ecn off, with cake, with fq_codel, but ya know, feel free to stop fixing the internet with me, and spend time with family, or shopping?

kevinb361 · Fri Dec 17, 2021 8:36 pm

Well, you shouldn't see that long term growth pattern either. This is after you tuned up the multipath tx/rx thing? What happens with bandwidth down less 20Mbit?

Anyway, thx again. I'm packing up for a trip south, (not to mexico! trying to get closer to the spacex launch), and can't look at this harder today.

When the network is more idle a reboot, putting in--step-size=0.05 -l 300 with ecn off, with cake, with fq_codel, but ya know, feel free to stop fixing the internet with me, and spend time with family, or shopping?

AHHA!! I see what ya did there. It didn't grow this time. I set it to 15mbit on upload.. I had been assuming all along that since wide open it gets 40mbit upload on his link, I was working around that range. My assumption is that I am used to the feel of my DSL setup.. and cable does things diferent? No science behind that statement but that looks way better than anything at 40mbit or even a high percentage of that. Now I have more testing to figure out where the happy place is on the upload bandwidth!

Oh sweet, spacex launch! I have always though it would be cool to go out with like a 600mm lens, probably even needing a tele-converter along with that.. and try to get some cool pictures! (I used to do a lot of drag racing photography years ago) No worries, go relax and decompress!

I will hopefully get some good testing in tonight. The ol lady has left this morning to the casino for the weekend so it's just me and the dogs! If I am really lucky, my brother will be out of town this weekend too, I will have to ask

Don't worry about me, I am just a nerd and love this stuff. Something new to learn! It gets me away from the normal grind, and I will be out riding the motorcycle later today and tomorrow with the brothers.. so that will be my break from the internet!

Anyhow.. to the data!

Data: https://drive.google.com/drive/folders/ ... sp=sharing

rtt_fair_var_-_cake_no_ecn.png

kevinb361 · Fri Dec 17, 2021 8:48 pm

Oh yeah, and yes this is with the multi-queue-ethernet setting on the interface

kevinb361 · Sat Dec 18, 2021 12:37 am

In an effort to take the human error out of my testing, and automation.. I am making some ansible playbooks, and then will work on making flent batch files to use these ansible playbooks to make appropriate config changes before/after each test as needed.

Dunno if I am gonna have time to get it all done for testing tonight.. but I hope so.. it would be nice to be able to kick off a whole round of tests in the middle of the night while sleeping

kevinb361 · Sat Dec 18, 2021 1:29 am

Woohoo! I got it all working as expected! =)

Now I just need to set it up on my brothers end, and just add a cronjob to kick off the test in the middle of the night. But, I gotta run out for a few hours.. so will set that all up later.. hah I guess I will be the cron job at that point! =P

dtaht · Mon Dec 20, 2021 4:14 pm

it looks like mikrotik has lost some data.

Larsa · Mon Dec 20, 2021 4:32 pm

Yeah, if you meant this forum it went down the other night. I saw some other threads where they complained about the same thing so I guess Mikrotik lacks a complete backup...

kevinb361 · Mon Dec 20, 2021 6:39 pm

it looks like mikrotik has lost some data.

Yep, they went down for most of the day yesterday.. it appears they restored a backup of the forum =(

kevinb361 · Tue Dec 21, 2021 7:04 pm

Just saw this in the changelog for the newly release 7.2rc1:

*) queue - improved system stability when processing traffic;

kevinb361 · Tue Dec 21, 2021 9:23 pm

Well, interestingly.. we never upgraded the switch at his place.. and there was a ton of updates in the changelogs.. it was v6.47 or something like that.. anyhow.. all the way up to 7.2rc1 on switch and router.

I dunno.. but not too shaby for a quick mid day test.. didn't have much time to get testing in during lunch while he was away. Download looks like a flat table when I clamp the bandwidth down.

This image is setting bandwidth to 40M on upload and uncapped on download.. which is what he is supposed to be getting.. was quite surprised actually! Not to mention.. are my eyes lying to me, it has better ping under load??

rrul_-_40M.png

dtaht · Wed Dec 22, 2021 5:11 am

Looks real to me. There are many possible interactions with the cmts, powersave and downstream shaper.

dtaht · Thu Dec 23, 2021 8:21 pm

Anyway, since we lost data, and I don't remember what it was, It would be good to post a summary of what the actual mikrotik configurations ended up being. The journey was educational for us, but I imagine to the outside observer, kind of frightening.

thx, and merry christmas! I may not be online a whole lot in the coming days.

blurrybird · Wed Dec 29, 2021 4:25 am

Would those with a working CAKE configuration please share their configurations within the mikrotik itself? This thread is amazing, but there are so many dials to tune and so much in-depth discussion going on that someone coming across it for the first time would need to spend hours reading everything in series to get context.

Specifically, I have a 100/40 VDSL2+ connection (Australian NBN - FTTN) that I'm looking to tune. I am comfortable running Flent but haven't gone on-site yet. I'd love to see a starting config or one from a similar user's use case. Just run:

/queue export compact

At the moment, I'm just running this without much thought:

# dec/29/2021 13:24:14 by RouterOS 7.1.1
# ...
# model = RBD52G-5HacD2HnD
/queue type
add kind=fq-codel name=fq_codel
/queue simple
add bucket-size=0.005/0.005 max-limit=100M/40M name=internal_qos queue=fq_codel/fq_codel target=ether1 total-queue=fq_codel

kevinb361 · Wed Dec 29, 2021 6:19 am

Apologies for the late reply.. I am still tinkering with my brothers setup. Without any queue.. I can get a gig download.. but any kind of queue, it drops in half.. I thought it was the 2.5g port per other peoples responses.. which in fact even without a queue, I cannot get over roughly 500mbit down on it.. however I can on a 1gb port. Weird. Anyhow.. I will post my working config for my DSL..

I have 100/20 VDSL2.. and this setup has been working like a dream! (The speeds are set to the sync rate in the modem 104/22.. YMMV)

/queue type
add cake-atm=ptm cake-diffserv=besteffort cake-mpu=88 cake-overhead=40 kind=cake name=cake-default
add cake-ack-filter=filter cake-atm=ptm cake-bandwidth=22.0Mbps cake-diffserv=besteffort cake-mpu=88 cake-nat=yes cake-overhead=40 kind=cake name=cake-up
add cake-atm=ptm cake-bandwidth=104.0Mbps cake-diffserv=besteffort cake-mpu=88 cake-nat=yes cake-overhead=40 cake-wash=yes kind=cake name=cake-down

/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1-WAN total-queue=cake-default

blurrybird · Wed Dec 29, 2021 8:30 am

I have 100/20 VDSL2.. and this setup has been working like a dream! (The speeds are set to the sync rate in the modem 104/22.. YMMV)

Thanks mate - not a slow reply at all! Mine is syncing 106/41 so I'll throw that in there for now, go for a few days, then see how it is.

blurrybird · Fri Dec 31, 2021 3:38 am

RB5009 arrived. Here's some brief testing of cake.
ISP: Aussie Broadband
Technology: Fibre To The Premise (FTTP)
Down/Up: 1000M/50M

/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default

Waveform Results:
Before: https://www.waveform.com/tools/bufferbl ... 7575df8878
After: https://www.waveform.com/tools/bufferbl ... db297e528f

ivicask · Fri Dec 31, 2021 8:31 am

RB5009 arrived. Here's some brief testing of cake.
ISP: Aussie Broadband
Technology: Fibre To The Premise (FTTP)
Down/Up: 1000M/50M
Code: Select all
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
Waveform Results:
Before: https://www.waveform.com/tools/bufferbl ... 7575df8878
After: https://www.waveform.com/tools/bufferbl ... db297e528f

Funny thing is you will get same results with any queue type, try sfq for example instead cake..

kevinb361 · Fri Dec 31, 2021 11:35 pm

RB5009 arrived. Here's some brief testing of cake.
ISP: Aussie Broadband
Technology: Fibre To The Premise (FTTP)
Down/Up: 1000M/50M
Code: Select all
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
Waveform Results:
Before: https://www.waveform.com/tools/bufferbl ... 7575df8878
After: https://www.waveform.com/tools/bufferbl ... db297e528f

Nice! Now run some flent tests!

To see the differences between SFQ and Cake, flent will show you..

blurrybird · Sat Jan 01, 2022 8:42 am

Nice! Now run some flent tests!

Thanks for forcing me to do this - perhaps I'm going back to the drawing board. Cake seems to make my upload nice and consistent, but download and latency is still all over the shop.

Taking suggestions on where to go from here!

dtaht · Sat Jan 01, 2022 5:28 pm

My guess is you are thoroughly out of CPU on the download, not being able to crack 400Mbit. So I would suggest applying cake with the ack-filter - to the upload only, at say, 40Mbit, to start with. Cake with the right encapsulation options can get very close to the rated rate (say, 48) on the uplink, but not on the downlink, and

I encourage folk to start with a number that is 85% of the rated bandwidth first, not something that is hard up against the ISP claimed rate. The goal is to take away the ISP's control of the queue. I would also expect any provider doing gbit/50mbit to also have an ack-filter in place themselves. It is nearly impossible to get a 20x1 ratio like that and even half your rated up with an asymmetric link like that due to acks filling up the uplink.

Some of the variability you see even on the baseline test could be due to a failure to keep up. Regrettably very few vendors actually test
full rate up and downloads at the same time. If you post the flent.gz files I can poke harder.

The cake result on the rrul test, where you are getting more on the up is due to having not got full bandwidth on the down, leaving more room for data rather than acks.

dtaht · Sat Jan 01, 2022 5:37 pm

@blurrybird, also I thought you were going to test a 100Mbit link, not a gbit one?

I am pretty sure cake has a role in a gbit/50mbit scenario on just the uplink, but it has historically required good x86 hardware to inbound shape the down at a gbit.

blurrybird · Sun Jan 02, 2022 2:07 am

My guess is you are thoroughly out of CPU on the download, not being able to crack 400Mbit.

What is strange is that the resource monitor in the router would suggest it's perfectly fine doing this (40-60% util on all cores), but the numbers don't lie. You're correct I was going to test on a 100/40 link (my in-laws'), but my new RB5009 arrived at my own house and so I wanted to see what it could do on 1000/50 as well. The experience I gain from this exercise will give me the ability to set it up on the in-laws' later on.

Per your advice I have changed it to the following config with these new results (note I was already doing ack-filter on the upload).

I've attached the flent.gz files for both the original baseline/cake runs from yesterday, as well as this morning's set. Really appreciate the help!

I'm worried that I'm configuring this incorrectly within the Mikrotik UI (some say to use interface queues?) - but even that is probably worthwhile feedback for the next person if it ends up being the case.

# Enable fasttrack-connection only on inbound = WAN to exclude download from SQM
/ip firewall filter
add action=fasttrack-connection chain=forward comment="defconf: fasttrack" connection-state=established,related hw-offload=yes in-interface-list=WAN
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=40.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=1000.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default

dtaht · Sun Jan 02, 2022 3:31 am

right now I do not trust mikrotik's treatment of the diffserv bits. could you kill the wash option and use rrul_be? No way should that download been able to run away like that.

I guess I should get a mikrotik box myself and experiment? I haven't used it in years. I'm very interested in the many core - octeon - versions.

It's not clear to me you actually managed to disable cake on the down.

I have no idea what this does - what does the bucket-size thing do?

add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default

Your baseline really is pretty good. Like another gig/50mbit test it strikes me you have pretty short (and packet fifo) buffers there already. Other simpler tests like the tcp_nup test and tcp_ndown test would

As much as I LOVE the rrul (and hate speedtest for single number summaries), and it's easy to determine what a good rrul result looks like, when things go south - like so far me not being able to trust the diffserv OR ecn handling as yet - it's pretty hard to debug without packet captures and actual statistics from cake. Also the server I have in sydney may well be struggling itself at these speeds. I can setup another.

eider · Sun Jan 02, 2022 4:08 am

Have you tried doing test while setting cake-bandwidth to unlimited? I believe that RouterOS sets all qdiscs with CAKE as default egress, which doesn't work so well for ingress at these speeds.

Specifically, I believe that what it is doing is:

tc qdisc add dev ether1 root cake 1000Mbps besteffort nat

While it should be doing:

tc qdisc add dev ether1 root cake 1000Mbps besteffort nat ingress

Hence, my proposal to verify it is to set bandwith to unlimited and see how it behaves then.

---

I have no idea what this does - what does the bucket-size thing do?

add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default

https://wiki.mikrotik.com/wiki/Manual:H ... _Algorithm
In short terms he tries to make sure that no packets go through queue unrestricted.

blurrybird · Sun Jan 02, 2022 4:20 am

New config with wash disabled created almost an identical graph.

I don't know much about the bucket size flag but perhaps this could be my problem? It looks like the mikrotik 'simple queue' incorporates HTB + whatever algo you pick?
See this thread I found on the topic: viewtopic.php?t=108292

I think this comes back to me probably mis-understanding how to actually implement cake inside the router.

In my experimentation, I can only ever observe the rate limiting occurring correctly when cake is set as an interface queue but there's no way to set an asymmetric bandwidth limit in that configuration. Someone else said the tik implementation hardcodes 'egress' in the cake parameters and that's in line with what I see - any limit I set only applies to the upload direction, and only when configured as an interface queue rather than as part of a simple queue.

Would love to hear from someone at mikrotik about this!

eider · Sun Jan 02, 2022 4:32 am

In my experimentation, I can only ever observe the rate limiting occurring correctly when cake is set as an interface queue but there's no way to set an asymmetric bandwidth limit in that configuration

I see no such issue, I can set my CAKE queue to cake-bandwidth=50M, set that queue type as download on simple queue and it works properly, limiting speed to 50M.

See this for my configuration if you need reference:

/queue type
add cake-diffserv=besteffort cake-flowmode=dual-dsthost cake-mpu=64 cake-nat=yes cake-overhead=22 cake-overhead-scheme=ether-vlan,via-ethernet,docsis cake-rtt-scheme=internet kind=cake name=cake-docsis@download,unlimited
add cake-bandwidth=50.0Mbps cake-diffserv=besteffort cake-flowmode=dual-dsthost cake-mpu=64 cake-nat=yes cake-overhead=22 cake-overhead-scheme=ether-vlan,via-ethernet,docsis cake-rtt-scheme=internet kind=cake name=cake-docsis@download,50M
add cake-ack-filter=filter cake-bandwidth=40.0Mbps cake-diffserv=besteffort cake-flowmode=dual-srchost cake-mpu=64 cake-nat=yes cake-overhead=22 cake-overhead-scheme=ether-vlan,via-ethernet,docsis cake-rtt-scheme=internet kind=cake name=cake-docsis@upload,40M

/queue simple
add bucket-size=0.1/0.2 dst=ether1_wan max-limit=40M/700M name=wan queue=cake-docsis@upload,40M/cake-docsis@50M,unlimited target="" total-queue=default
add bucket-size=0.005/0.005 name=priority packet-marks=icmp,dns,syn,http-init,sip parent=wan priority=1/1 target=""
add bucket-size=0.05/0.1 name=untracked packet-marks=no-mark parent=wan queue=cake-docsis@upload,40M/cake-docsis@download,50M target="" total-queue=default

Usually I use cake-docsis@download,unlimited queue but for tests I have change it to new one cake-docsis@download,50M and I can see it working properly. For reference, cake-docsis@upload,40M also works properly and will reduce speed if modified to lower ones.

Note that I have created simple queue with target 0.0.0.0/0 and destination set to WAN interface, and then used that queue as parent for other queues (omitted for brevity, only priority and untracked shown) however in your case it is not necessary to create any child queues so you can ignore that part.

With CAKE, you should also be able to ignore the max-limit on simple queue itself, however note that if you do that and then create additional children under it you won't be able to use limit-at to guarantee minimum speed for that children.

blurrybird · Sun Jan 02, 2022 10:57 am

Did a bit more testing, you're right. It's working as intended but it's probably still being hardcoded as 'egress' so it's not ideal.

Hm, could be related to the test server? I ran two speedtest.net tests at the same time as running a flent test. Code config and attached result below.

While individually they showed bad bandwidth metrics, the mikrotik interface showed the ether1 connection being saturated in both directions as intended.

/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=aggressive cake-bandwidth=40.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=900.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default

dtaht · Sun Jan 02, 2022 7:17 pm

Really good example to show the effects of a simultaneous speedtest against a rrul test! But I wouldn't call it "bad bandwidth metrics", but perhaps, "sanely reduced?". What that test shows is the side effect of that traffic on the rrul workload through cake. Given that the latencies held are so low, the speedtest enters the path at T+38, the rrul download flows yield (as a function of the number of total flows), and we use up some bandwidth for the acks on the upload, in both cases quickly, the download phase ends (and the rrul flows regain the bandwidth, quickly), then the upload phase starts, and we quickly yield and regain the bandwidth requested.

That was how the internet was supposed to work!!!

In terms of seeing cake outperform fq_codel, if you were to do a speedtest from another IP address, instead, the speedtest would grab roughly half of the overall bandwidth, again yielding and restoring very quickly, which shows the benefit of per host/per flow FQ.

And in all cases the observed latency for all flows from everywhere would stay relatively flat, packet loss would be minimal to non existent for flows of a lower rate than these.

Another very good test is running rrul and doing a web page PLT (page load time) benchmark, and observing the side effects here too. Usually web pages are almost entirely bound by RTT at bandwidths above 20Mbit. I know I've linked to a lot of papers over the course of this testing cycle, but this one was really really crucial and if only more had read it in 2010... https://www.belshe.com/2010/05/24/more- ... tter-much/ (please click through to the paper!!!!)

Seeing how sloppily a link a FIFO might perform with a rrul going, and the side effects on the speedtest itself (much slower bandwidth growth), is really useful for grokking the importance of rtt.

And lastly... speedtest optimizes a network for... speedtest. It has no resemblance to any other form of realistic network traffic at all, be it web, voip, videoconferencing, a typical upload or download pattern, netflix, or a family of four. I try not to rant more than once a week about how much "speedtest" has cursed the internet's design and optimization. rrul attempted to capture and understand the side effects of torrent-like traffic on interactive traffic, but it was always my intent that it be used to create a steady background load that other traffic could be measured against. I talked about those use cases in my talks at MIT and Stanford back in the early days.

mducharme · Sun Jan 02, 2022 10:55 pm

That bucket size is extremely small. My experience with such tiny bucket sizes is that it is often impossible to reach higher rates. I would suggest increasing the bucket size for testing (default is 0.1).

dtaht · Mon Jan 03, 2022 12:43 am

that's good to know! Small htb quantums suck, also, and it's totally necessary to scale it in the sqm-scripts.

All the same, NO bucket size no htb instance should be required for cake, or a really big bucket size set.

dtaht · Mon Jan 03, 2022 12:44 am

very good update on mike belshe's paper here: https://arxiv.org/pdf/1906.04753.pdf

blurrybird · Mon Jan 03, 2022 2:31 am

Thank you all for the inputs. I read the original article, and skimmed the updated paper.

I've always tried to preach to others that bandwidth doesn't matter; latency does. It's nice to have some solid research on hand to back those claims up.

Through incremental (10Mbps) bumps in the bandwidth limits I've settled on a configuration that works well. The dips are when I executed a speedtest in parallel from a second device. While both tests were running, I saw the adapter pushing ~850Mbps of traffic at peak (via Winbox). A speedtest.net run by itself achieves 900/43 (for those who care about sharing those numbers).

To re-cap (and in case any fellow Australians find this), this is an Aussie Broadband NBN FTTP 1000/50 connection running on an RB5009UG+S+. Fasttrack firewall rules are disabled.

/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=aggressive cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=945.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-down
/queue simple
add bucket-size=0/0 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default

I'll call that a success for now.

Now to go and tackle the 100/40 connection down the road.

dtaht · Mon Jan 03, 2022 4:33 pm

With the above result, yes, I think y'all have compelling reasons to run out and deploy fq_codel and cake everywhere you can, ASAP.

Tporlapt · Mon Jan 03, 2022 4:42 pm

With the above result, yes, I think y'all have compelling reasons to run out and deploy fq_codel and cake everywhere you can, ASAP.

I feel compelled

…but worth reminding that Simple Queues as used in some of the examples in this thread appear to break IPv6 under ROS 7.1.1 (ref viewtopic.php?t=181705)

blurrybird · Tue Jan 04, 2022 7:31 am

I feel compelled

…but worth reminding that Simple Queues as used in some of the examples in this thread appear to break IPv6 under ROS 7.1.1 (ref viewtopic.php?t=181705)

Heh, that's my thread too. Yes it's upsetting, but the Mikrotik Support team said they have replicated the problem based on my logs and look forward to a fix in a future version.

that version is soon. For now I'm running Cake with IPv6 disabled.

WeWiNet · Tue Jan 04, 2022 11:23 am

/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=aggressive cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=945.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-down
/queue simple
add bucket-size=0/0 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default

I'll call that a success for now.

Now to go and tackle the 100/40 connection down the road.

I try your setup (just copy above config into my Chateau 5G terminal), IPV6 disabled, WAN interface LTE1 and changing values to 30Mbps DL and 5Mbps UL to adopt to my access speed.
If I do speedtest it goes to 45Mbps and more for DL and 10Mbps for UL. So basically exceeds the speeds set in the config.
Is this expected behavior ... ?
I also notice if you check the simple queue, tab "advanced", it shows "cake-up" under Downlink and cake-down under Uplink? Again, is this expected?

Amm0 · Wed Jan 05, 2022 5:15 am

The other question from Bithaulers
any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours?

is also very valid, same problem again on my side. LTE (and soon 5G even worse) is a medium where in 24h the "pipe" itself changes heavily.
In this situation it is really hard to define the pipe size and do queueing with fixed values gets almost impossible.
What can CAKE do in this case?

Do have a followup on this one...

The CQI provided by Mikrotik tells you a fair amount of the expected speed (e.g. it tell you the modcode thus at least max speed). With more data (e.g. RSRP/RSPQ + EARFCN/Mhz) gives you more to proximate at least some temporary max speed. Since this data is readily available in a scheduler script, and a basic heuristic tied to CQI (ranges from 1 to 15, higher is better) is pretty easy to write (obviously a more sophisticated script could calculate even better the max POTENTIAL). This part is just some math and ROS script – to at least know the MAX you could ever see – then adapted downward purportially to set the queue Mb/s limits/etc.

While LTE speeds typically vary wildly by TOD, typically those dramatic swings in speed happen over say a few minutes, as the RF/backhaul situations worse. Network traffic also causes some "stickiness" to a speed, so that also stablitizes speeds somewhat. Anyway LTE isn't some light switch that goes from hero to 0 typically. You more often see it vacillate between a couple different speed profiles (e.g. say 20up/10down OR 40up/20down in a typical 4G network).

The question is if you arrived at some expected speed based on some cell data, how often would updating the queue from a script be appropriate. In other words, do you have to "shape the queue adjustments". Obviously slamming cake every few seconds with a new calculated value for LTE speed that varied wildly would likely not be much use.

Basically curious if there is a sense if "dynamicly updating" the queue config has unattended side-effects... I'm thinking every 15s to 2m would be happy medium – assuming nothing weird happens upon changing the queue like dropping connection or other things I haven't considered...thus the question