Page 1 of 1

CRS 3xx - L3 ASIC performance testing

Posted: Mon Oct 12, 2020 8:40 pm
by StubArea51
Did some work on testing the L3 performance last week in 7.1beta2 and published it today.

https://stubarea51.net/2020/10/12/mikro ... e-testing/

Image

Re: CRS 3xx - L3 ASIC performance testing

Posted: Mon Oct 12, 2020 8:49 pm
by Dude2048
Thanx, higher than I expected. Nicely done.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Tue Oct 13, 2020 10:11 am
by raimondsp
Regarding small MTU tests (tests #1 and #2), I suppose that the bottleneck is on the packet generator or receiver side, not the CRS317. As you see, PPS (packets per second) value is almost the same in all three cases, and the transfer speed depends purely on packet size. That is a typical case for CPU, where each packet causes an interrupt, which, in turn, adds performance overhead. ASIC doesn't care much about the packet count.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Tue Oct 13, 2020 3:18 pm
by StubArea51
Thanks for the feedback...i'll check the hypervisor and see if it's creating a bottleneck somewhere.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Tue Oct 13, 2020 11:14 pm
by MrYan
That is a typical case for CPU, where each packet causes an interrupt, which, in turn, adds performance overhead. ASIC doesn't care much about the packet count.
This. In fact you did well to get 1 Mpps from a Linux box (Proxmox/KVM) without any tuning. CloudFlare had to put a lot of effort into tuning to get that number - https://blog.cloudflare.com/how-to-rece ... n-packets/

Re: CRS 3xx - L3 ASIC performance testing

Posted: Sun Oct 18, 2020 12:05 am
by morf
Regarding small MTU tests (tests #1 and #2), I suppose that the bottleneck is on the packet generator or receiver side, not the CRS317. As you see, PPS (packets per second) value is almost the same in all three cases, and the transfer speed depends purely on packet size. That is a typical case for CPU, where each packet causes an interrupt, which, in turn, adds performance overhead. ASIC doesn't care much about the packet count.
+++

Re: CRS 3xx - L3 ASIC performance testing

Posted: Sun Nov 08, 2020 5:18 pm
by Maggiore81
Hello to all.
May the CRS with routeros be used as BGP router to forward almost wirespeed packets?

Re: CRS 3xx - L3 ASIC performance testing

Posted: Sun Nov 08, 2020 5:49 pm
by morf
I think it's possible. Only if there aren't enough routes to fit in memory.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Mon Nov 09, 2020 8:41 am
by raimondsp
CRS with RouterOS can be used as a BGP router unless the number of routes exceeds the hardware memory capabilities.

Refer to "List of supported devices and their limits" table on the link below:
https://wiki.mikrotik.com/wiki/Manual:C ... Offloading

Re: CRS 3xx - L3 ASIC performance testing

Posted: Mon Nov 09, 2020 10:26 am
by Maggiore81
On the link it says:
Depending on the complexity of routes in routing table, max HW accelerated route count could change (see table below for min-max supported route count for each hardware). Whole-byte IP prefixes (/8, /16, /24, etc.) occupy less HW space than others (e.g., /22).
If HW route limit is reached new routes will fall back to CPU, except cases when newly added route overlaps with already existing routes processed by hardware. In this case destinations that were processed in hardware will continue to be processed in hardware. The user should choose the device with HW capability large-enough to store all the routes


Yes. I have seen that doc.
Is there a way to raise that limit?

Re: CRS 3xx - L3 ASIC performance testing

Posted: Mon Nov 09, 2020 3:46 pm
by mozerd
@IPANetEngineer
Very Nice L3 Forwarding test ... hopefully in 2021 the production stable version of ROS7 will be completed.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Mon Nov 09, 2020 4:36 pm
by raimondsp
Unfortunately, it is the hardware limitation. There is not enough internal memory in the switch chip to offload the full BGP table. However, if possible, there is an option to limit the incoming BGP route prefixes via
/routing/filter/
Also, we are working on an option to filter out the prefixes for offloading, i.e., to offload routes with potentially the most traffic while the rest gets processed by the CPU.

If the router needs to handle the full BGP table, I suggest looking forward to CCR devices rather than CRS.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Wed Nov 11, 2020 3:28 pm
by Maggiore81
Very very interesting.
Using RouterOS we could use BGP to have some internal routes (less than 1000).
we could route them L3 in hardware...
Is something related to fastpath here? Or can we use some firewall filters?
we wont need conntrack or something similar.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Wed Nov 11, 2020 3:29 pm
by Maggiore81
Did some work on testing the L3 performance last week in 7.1beta2 and published it today.

https://stubarea51.net/2020/10/12/mikro ... e-testing/

Image
Hello.
In your article are missing the notes,
I mean in the table of the max number of connections, are notes, but are not in the page.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Wed Nov 11, 2020 3:39 pm
by raimondsp
Very very interesting.
Using RouterOS we could use BGP to have some internal routes (less than 1000).
we could route them L3 in hardware...
Is something related to fastpath here? Or can we use some firewall filters?
we wont need conntrack or something similar.

There are two distinct L3HW modes in RouterOS v7.1beta2:
  • l3hw=yes (a.k.a. full routing or l3-switching) - the entire routing table gets offloaded to the hardware; traffic gets routed entirely by HW; nothing goes though CPU, and therefore, ROS stateful firewall does not work.
  • l3hw=fw - Firewall-compatible routing. Initially, packets go through CPU/Firewall, then Fasttrack connections get offloaded to the hardware. Consider this as a hardware-accelerated L4 stateful firewall. Unfortunately, the number of hardware connections is strictly limited by the capacity of the internal hardware memory.

Please note that we are talking about a stateful firewall here. Stateless firewall still can be set in l3hw=yes mode via switch ACL rules:
/interface/ethernet/switch/rule/

Re: CRS 3xx - L3 ASIC performance testing

Posted: Wed Nov 11, 2020 3:50 pm
by Maggiore81
Hello
Perfect.
But the question is:
a) l3hw=yes (a.k.a. full routing or l3-switching) - the entire routing table gets offloaded to the hardware; traffic gets routed entirely by HW; nothing goes though CPU, and therefore, ROS stateful firewall does not work.
In routerOS will be enabled fastpath then?
If we set some rules on the INPUT chain just to protect the router, we lose the hardware feature?

b) l3hw=fw - Firewall-compatible routing. Initially, packets go through CPU/Firewall, then Fasttrack connections get offloaded to the hardware. Consider this as a hardware-accelerated L4 stateful firewall. Unfortunately, the number of hardware connections is strictly limited by the capacity of the internal hardware memory.

Is there a table? I have seen in the link at the first post, but it is not clear what the number means... 3750 connections, really? it is very low...

thank you

Re: CRS 3xx - L3 ASIC performance testing

Posted: Wed Nov 11, 2020 4:38 pm
by raimondsp
In routerOS will be enabled fastpath then?
No, ROS firewall (/ip/firewall) does not work simply because packets never enter CPU.

If we set some rules on the INPUT chain just to protect the router, we lose the hardware feature?
The traffic to the router itself (packet destination IP = router IP; INPUT chain) is unaffected by the l3hw. The firewall stays fully functional here. The same applies to outgoing traffic (OUTPUT chain).
Regarding routed traffic (FORWARD chain, or PRE/POSTROUTING chains for forwarded packets), in the case of l3hw=yes, setting those rules does nothing because the firewall (/ip/firewall) does not get triggered. You need to set l3hw=no or l3hw=fw to make the stateful firewall to work. However, a stateless firewall still is an option via switch ACL rules. For example, you can allow/block specific IP addresses/prefixes or TCP/UDP ports. More info here: https://wiki.mikrotik.com/wiki/Manual:C ... _.28ACL.29

Is there a table? I have seen in the link at the first post, but it is not clear what the number means... 3750 connections, really? it is very low...
Yes, unfortunately, the number of hardware connections is limited. Actually, it is 4500 if used without MPLS. Mikrotik smart offloading algorithm picks the heaviest (traffic-wise) connections for offloading at any given time. Other (slower) connections get processed by the CPU. So the number of connections can be much greater. For instance, we tested CRS317 with 10k connections, and it worked fine.

Please take into account that CRS (Cloud Router Switch) series are more "switch" than a "router". Consider the ability to run an L4 hardware-accelerated firewall more like a bonus feature rather than a common use-case. For heavy routing, please look into the CCR series.

Currently, Mikrotik engineers are working on a "hybrid l3hw mode" which allows running both l3hw=yes + l3hw=fw on the same device. For example, it will allow hardware inter-VLAN routing (with an unlimited number of connections) while running Firewall/NAT on the upstream port(-s).

Re: CRS 3xx - L3 ASIC performance testing

Posted: Wed Nov 11, 2020 5:58 pm
by Maggiore81
Thank you for you explanations.
The idea was to use a CRS to route l3 between interfaces at FAAAAST speed via BGP.
The issue is how can I protect the router itself then ?
Never tried the switch rules...

Re: CRS 3xx - L3 ASIC performance testing

Posted: Wed Nov 11, 2020 7:04 pm
by mkx
@raimondsp: can you kindly compare different modes of operation of l3hw to HW-offloaded L2? I can imagine many parallelisms, but as I don't have any experience with CRS3xx L3 offloading I can't say if those parallelisms are real or imaginary.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Thu Nov 12, 2020 9:39 am
by raimondsp
Thank you for you explanations.
The idea was to use a CRS to route l3 between interfaces at FAAAAST speed via BGP.
The issue is how can I protect the router itself then ?
Never tried the switch rules...

I'm so sorry for misleading. INPUT/OUTPUT chains are unaffected by l3hw because the hardware redirects those packets to/from the CPU. The firewall (that is running on the CPU) stays fully functional in such cases. Hence, enabling l3hw does not affect your abilities to protect the router itself.

What I really meant (but originally failed to explain) is that, in the case of l3hw=yes, you cannot enable the firewall on forwarded traffic. For example, to protect a server behind a router.

I edited my original post to avoid confusion.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Thu Nov 12, 2020 9:41 am
by Maggiore81
Hello and good morning.
I was not sure about your first claim about the input/output. Thank you very much for claryfing it.
So the CRS can be a full functional BGP router, with hw forwarding, I dont see the traffic passing by, it is not an issue, but I can protect the router itself.
At the moment I use a CCR1036+10G switch with fasttrack, It could be easily replaced with a CRS317 that has all the 10G ports on it.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Thu Nov 12, 2020 10:31 am
by raimondsp
@raimondsp: can you kindly compare different modes of operation of l3hw to HW-offloaded L2? I can imagine many parallelisms, but as I don't have any experience with CRS3xx L3 offloading, I can't say if those parallelisms are real or imaginary.

Basically:
  • L2 HW offloading = bridging on the hardware level.
  • L3 HW offloading = routing on the hardware level.


L3HW does not affect L2HW since the L2-forwarded (bridged) traffic is not subject to routing.


Let's look at a basic example: two VLANs configured on a CRS317 bridge:
  • VLAN100 on interfaces sfp-sfpplus1 - sfp-sfpplu4
  • VLAN200 on sfp-sfpplu5 - sfp-sfppl8.
For simplicity, let's name the interfaces SFP1-SFP8.

The device connected to SFP1-SFP4 can communicate with each other on the L2 level because those belong to the same LAN. The same applies to SFP5-SFP8. All CRS3xx devices provide L2 hardware offloading; therefore, expect near to wire-speed performance.

Now imagine that there is a server connected to SFP1 that needs to be accessed by all devices, including VLAN200*. Hosts that belong to different VLAN cannot communicate on the L2 level; therefore, packet forwarding must be escalated to L3, i.e., routing. If the switch does not support L3HW (in case of CRS317: v6, or before v7.1beta1, or l3hw=no), the routing is performed by the CPU, which dramatically decreases the network speed. While SFP2-SFP4 still benefit from wire-speed communication to SFP1 due to L2HW, SFP5-SFP8 are bottlenecked at CPU performance. Enabling L3HW allows inter-VLAN routing to be almost as fast as hardware bridging. Now all SFP1-SFP8 enjoy wire-speed communication.

* Actually, in such a basic example, you can avoid routing by making SFP1 a member of both VLAN100 and VLAN200, but that is not always a solution.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Thu Nov 12, 2020 10:35 am
by raimondsp
Hello and good morning.
I was not sure about your first claim about the input/output. Thank you very much for claryfing it.
So the CRS can be a full functional BGP router, with hw forwarding, I dont see the traffic passing by, it is not an issue, but I can protect the router itself.
At the moment I use a CCR1036+10G switch with fasttrack, It could be easily replaced with a CRS317 that has all the 10G ports on it.

Hi there,

I think CRS317 is a perfect solution for your case.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Thu Nov 12, 2020 7:21 pm
by mkx
@raimondsp: can you kindly compare different modes of operation of l3hw to HW-offloaded L2? I can imagine many parallelisms, but as I don't have any experience with CRS3xx L3 offloading, I can't say if those parallelisms are real or imaginary.

Basically:

I was thinking more in direction: if bridge is not offloaded at all, then device can enforce both firewall (if use-ip-firewall=yes) and bridge filters. If bridge is offloaded to HW, then neither can be used and only ACLs (if supported by switch chip) can affect traffic between two interfaces. So I imagine that the first case (no offload) can be compared to l3hw=no and second case (full HW offload) can be compared to l3hw=yes. I guess there is no bridge mode of operation that could be compared to l3hw=fw because bridge can either be fully offloaded or not at all. Or am I mistaken?

Re: CRS 3xx - L3 ASIC performance testing

Posted: Tue Nov 17, 2020 9:48 am
by raimondsp
Unlike l3hw option, use-ip-firewall=yes controls only the packets that enter the CPU. use-ip-firewall does not disable L2 hardware offloading. Actually, it is impossible (and does not make sense) to disable L2HW on the switch chip.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Sat Nov 28, 2020 3:59 pm
by pubudeux
From the wiki, I understand that L3 hardware offloading is currently only in the CRS317.

Are there plans (and is it possible) for L3 hardware offloading to be enabled on other CRS3XX devices?

I have a CRS328, and thanks for providing these test results.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Sat Nov 28, 2020 4:27 pm
by mbovenka
From the wiki, I understand that L3 hardware offloading is currently only in the CRS317.

No, in the newer betas more devices are supported:

https://wiki.mikrotik.com/wiki/Manual:C ... Offloading, look for 'List of supported devices and their limits'

The CRS328 isn't among them (yet), though.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Sat Nov 28, 2020 7:28 pm
by pubudeux
My question really is - should I expect to eventually get support for L3 hardware switching with my CRS328, or should I plan to find a different solution.

I have time as a homelab user, so if the only thing standing in between that is a software release I can wait, but I know 328 has a different chip than the one on all the currently supported switches for the beta.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Sat Nov 28, 2020 7:36 pm
by mkx
Recently one of MT support guys wrote that currently they're running feasibility study for supporting L3 switching on CRS328. He explicitly said that nothing is determined yet ... so it may end up with no L3 switching on this device ... and even if it does happen, it may take a while before it gets implemented.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Mon Nov 30, 2020 5:42 am
by pubudeux
Thanks for that. I've tried searching all over but have not seen anyone from MT mention the CRS328 yet re: hardware L3. In the meantime I am exploring a software solution, maybe VyOS to run on a VM.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Mon Nov 30, 2020 8:54 am
by raimondsp
Recently one of MT support guys wrote that currently they're running feasibility study for supporting L3 switching on CRS328. He explicitly said that nothing is determined yet ... so it may end up with no L3 switching on this device ... and even if it does happen, it may take a while before it gets implemented.
^ this

Re: CRS 3xx - L3 ASIC performance testing

Posted: Fri Sep 17, 2021 10:44 am
by Maggiore81
Just to clearify all the doubts.

CRS317 with routeros 7 latest

sfpplus1 + 2 is a 802.3ad BOND (wan)
sfplus 15 +16 is a 802.3ad BOND (to backbone, towards users)
sfpplus 10 is a remote network
sfpplus 11 is a remote network

on all ports we have a /29 and we do plain BGP v4+v6, no filters on FORWARDED traffic
some users FROM the backbone (15+16) need to do a NAT src nat, very little traffic, mgmt traffic

is applicable the situation described above?
we currently have a 1072, and all the traffic !local is in slow-path no-track, we receive a l2tp vpn on the router with little traffic (mgmt). on peak hours evening we lost packets.
can we solve with the 317 hw forwarding?

Re: CRS 3xx - L3 ASIC performance testing

Posted: Fri Sep 17, 2021 12:12 pm
by raimondsp
Just to clearify all the doubts.

CRS317 with routeros 7 latest

sfpplus1 + 2 is a 802.3ad BOND (wan)
sfplus 15 +16 is a 802.3ad BOND (to backbone, towards users)
sfpplus 10 is a remote network
sfpplus 11 is a remote network

on all ports we have a /29 and we do plain BGP v4+v6, no filters on FORWARDED traffic
some users FROM the backbone (15+16) need to do a NAT src nat, very little traffic, mgmt traffic

is applicable the situation described above?
we currently have a 1072, and all the traffic !local is in slow-path no-track, we receive a l2tp vpn on the router with little traffic (mgmt). on peak hours evening we lost packets.
can we solve with the 317 hw forwarding?
Hi,

Below are answers to your case. Please don't hesitate to ask follow-up questions if needed.
  • CRS317 switch chip supports hardware bonding, i.e., 802.3ad BOND can be offloaded to the hardware.
  • CRRS317 supports up to 160-240k IPv4 routes prefixes. If that's enough for your BGP setup, the entire routing table can be offloaded to the hardware.
  • IPv6 HW offloading is not supported yet. It is in development, though. But I wouldn't expect it sooner than by the end of the year. Until then, IPv6 routing is performed by the CPU, but CRS317's CPU is not sufficient to perform routing on a decent level.
  • CRS317 is capable of doing NAT for FastTrack connections on the hardware level. The maximum number of offloaded FastTrack connections: 4608 (4.5K). To create a FastTrack connection, you need to redirect the initial packet to CPU/Firewall first. In your case, redirecting the entire port(-s) to the CPU is not an option because you will hit the FastTrack limit very quickly. But if it is possible to filter out the connections requiring NAT by IP, you can create switch ACL rules to do that on the hardware level. The next example redirects to the CPU traffic from 15 and 16 ports to the 10.1.0.0/16 subnet. Packets redirected to CPU go through the Firewall, and, therefore, may be subject to FastTrack/NAT, followed by HW offloading of the respective connections.
    interface/ethernet/switch/rule add ports=sfp-sfpplus15,sfp-sfpplus16 dst-address=10.1.0.0/16 redirect-to-cpu=yes
    
  • Another option is to keep both CCR1072 (since you already have one) + CRS317, and forward packets that require NAT to CCR1072 while doing the rest of the routing on CRS317 hardware.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Fri Sep 17, 2021 12:27 pm
by Maggiore81
Super.
we need some thousand of internal routes (3000-5000 internal routes), so everything could be in hardware.
we need for mgmt traffic, just 500+ entries in the conntrack, so no issues with the limit.
we don't use VLANs, just plain ip on the various interfaces.
it can still be l3 forwarding?

we just have some ip firewall raw filters (does this interfere with the l3 forwarding) ?

we need really just some rules on the INPUT chain.

no issues for a L2TP vpn for remote mgmt traffic ? will the cpu be able to do that?

thank you

Re: CRS 3xx - L3 ASIC performance testing

Posted: Fri Sep 17, 2021 1:14 pm
by raimondsp
INPUT and OUTPUT chains work fine with l3hw offloading. The reason why the FORWARD chain does not work in L3HW Full Routing mode (l3-hw-offloading=yes on both the switch and ports) is that the forwarded packets never enter the CPU, and, therefore, do not trigger the Firewall.

I'm not sure about L2TP VPN performance, but since it is for management traffic only, I guess the CRS317 CPU should handle that. Anyway, even if L2TP traffic will 100% load the CPU, it does not affect the offloaded routes where the hardware still performs routing at near-wire speed.

Re: CRS 3xx - L3 ASIC performance testing

Posted: Tue Apr 05, 2022 10:51 am
by RainbowDash
[*] IPv6 HW offloading is not supported yet. It is in development, though. But I wouldn't expect it sooner than by the end of the year.
Any updates when IPv6 hw offloading might be available?