It appears that ROSv7 on ARM/ARM64 locks MPLS decapsulation to cpu0, resulting in a major bottleneck for MPLS/VPLS networks using ARM64 routers such as the CCR2004 and CCR2116.
For example, on our network, we have CCR2004s as our core and site routers. When a device connected to a site router uploads traffic, throughput caps around 500-700 Mbps, cpu0 of the core router hits 95%+, and significant packet loss occurs. Download traffic is mostly unaffected by this issue.
We have spoken to 3 other network operators who use VPLS and who are experiencing this. It is reproducible with both the CCR2116 and CCR2004.
We recently interacted with an ISP that expressed frustration with the asymmetrical VPLS performance issues with the CCR2116/CCR2004 routers. They ended up switching to Arista for their core router, leaving their tower sites as CCR2116s. Interestingly, they are now able to pull nearly line rate (9800/9800 Mbps) over VPLS from the tower site CCRR2116s, whereas before they were limited to ~500 Mbps upload.
Another ISP acquaintance tested and found that the single-core bottleneck occurs primarily during decapsulation, not during encapsulation.
As a temporary workaround, we have overlaid VXLAN, which is correctly multi-threaded on ROSv7 for ARM/ARM64. However, fixing the MPLS/VPLS issue on ROSv7 for ARM64 would be ideal given all of MikroTik's flagship routers are ARM64.
If you have experienced this issue, please chime in.
Issue reported to MikroTik support on 12/06/2023 as SUP-136817.