I have stumbled these days upon exactly the same issue! (Although after loosing almost half a day looking in all the wrong places...)
However I did manage to "squeeze" almost 500 Mib out of a hEX router (RB750Gr3 running the latest 6.41.2), by using two ISP's both providing 300 Mib download, load-balancing between them, and using NAT.
Profiling the system with
/tool profile cpu=all freeze-frame-interval=6
yields that almost 30-40% is spent in `network` and 40-50% in `firewall`.
In order to obtain this bandwidth I played a lot with the firewall rules, and currently I applied the following pattern:
- for the `forwarding` chain in `filter` I have created a new chain `forwarding-new` where I jump-to for all packets matching `connection-state=new`; herein I put all my actual forwarding rules;
- for the `prerouting` chain in `mangle` I have created a new chain `prerouting-mark` where I jump-to for all packets matching `connection-state=new`; herein I put all my connection marking rules, making sure to always use `connection-mark=no-mark` as a precondition and `passthrough=yes`;
- in the `prerouting` chain in `mangle` the `mark-routing=...` rules all have `passthrough=no`;
By using this setup the typical packet has to pass only through a few rules, especially in the `mangle` table.
I have also made the following observations:
- although the `forwarding` chain of `filter` has as the first rule an `action=accept` for `connection-state=established,related`, without resorting to my `forwarding-new` chain the performance drops almost by 50%;
- yesterday I managed to get almost 600 Mib (i.e. the maximum allowed by the two ISP's combined), but for some reason I can't replicate that anymore...
- the limit I managed to get today (i.e. 500 Mib) by taking into account that the actual traffic passing through the router is twice as much (i.e. 500 client->router and another 500 router->ISP) I was wondering if I didn't somehow managed to get into a hardware limitation of the hEX router?
- I have yet to find an "optimum" and "simple" way to express load-balancing rules in `mangle`...
- for some reason using routing rules that have `action=lookup-only-in-table` also helps with performance...
For reference bellow are my rules (disabled rules are prefixed with `#`):
/interface list member add interface=ether2-vlan1 list=lan
/interface list member add interface=ether2-vlan3 list=lan
/interface list member add interface=pppoe-out1 list=wan
/interface list member add interface=pppoe-out2 list=wan
/ip firewall address-list add address=172.30.163.0/24 list=local
/ip firewall address-list add address=172.30.214.0/24 list=local
# /ip firewall filter add action=accept chain=forward comment="forward accept all" disabled=yes
# /ip firewall filter add action=fasttrack-connection chain=forward connection-state=established,related disabled=yes
/ip firewall filter add action=accept chain=forward connection-state=established,related
/ip firewall filter add action=jump chain=forward connection-state=new jump-target=forward-new
/ip firewall filter add action=accept chain=forward-new in-interface=ether2-vlan1 out-interface-list=lan src-address=172.30.214.0/24
/ip firewall filter add action=accept chain=forward-new in-interface=ether2-vlan1 out-interface-list=wan src-address=172.30.214.0/24
# /ip firewall filter add action=accept chain=forward-new in-interface=ether2-vlan1 disabled=yes
/ip firewall filter add action=accept chain=forward-new in-interface=ether2-vlan3 out-interface-list=lan src-address=172.30.163.0/24
/ip firewall filter add action=accept chain=forward-new in-interface=ether2-vlan3 out-interface-list=wan src-address=172.30.163.0/24
# /ip firewall filter add action=accept chain=forward-new in-interface=ether2-vlan3 disabled=yes
/ip firewall filter add action=drop chain=forward connection-state=invalid
/ip firewall filter add action=drop chain=forward in-interface=ether1
/ip firewall filter add action=drop chain=forward in-interface=pppoe-out1
/ip firewall filter add action=drop chain=forward in-interface=pppoe-out2
/ip firewall filter add action=drop chain=forward
# filter rules for input and output are not listed
/ip firewall mangle add action=jump chain=prerouting connection-state=new jump-target=prerouting-mark
/ip firewall mangle add action=jump chain=postrouting connection-state=new jump-target=postrouting-mark
# whatever enters or exits a WAN interface and is not marked are "pinned" on that interface
/ip firewall mangle add action=mark-connection chain=prerouting-mark connection-mark=no-mark in-interface=pppoe-out1 new-connection-mark=con-x11 passthrough=yes
/ip firewall mangle add action=mark-connection chain=prerouting-mark connection-mark=no-mark in-interface=pppoe-out2 new-connection-mark=con-x12 passthrough=yes
/ip firewall mangle add action=mark-connection chain=postrouting-mark connection-mark=no-mark new-connection-mark=con-x11 out-interface=pppoe-out1 passthrough=yes
/ip firewall mangle add action=mark-connection chain=postrouting-mark connection-mark=no-mark new-connection-mark=con-x12 out-interface=pppoe-out2 passthrough=yes
# randomly assign new connections to one of the two WAN's
/ip firewall mangle add action=mark-connection chain=prerouting-mark connection-mark=no-mark dst-address-list=!local dst-address-type=!local in-interface-list=lan new-connection-mark=con-x11 passthrough=yes random=50 src-address-list=local
/ip firewall mangle add action=mark-connection chain=prerouting-mark connection-mark=no-mark dst-address-list=!local dst-address-type=!local in-interface-list=lan new-connection-mark=con-x12 passthrough=yes src-address-list=local
# traffic entering via WAN is always accepted (i.e. no need to mark it)
/ip firewall mangle add action=accept chain=prerouting in-interface=pppoe-out1
/ip firewall mangle add action=accept chain=prerouting in-interface=pppoe-out2
# here we mark forwarded packets (i.e. generated from the cliens)
/ip firewall mangle add action=mark-routing chain=prerouting connection-mark=con-x12 in-interface-list=lan new-routing-mark=gw-x12 passthrough=no
/ip firewall mangle add action=mark-routing chain=prerouting connection-mark=con-x11 in-interface-list=lan new-routing-mark=gw-x11 passthrough=no
# here we mark output packets (i.e. generated from the router itself); note no load-balancing is done for the router generated traffic
/ip firewall mangle add action=mark-routing chain=output connection-mark=con-x11 new-routing-mark=gw-x11 passthrough=no
/ip firewall mangle add action=mark-routing chain=output connection-mark=con-x12 new-routing-mark=gw-x12 passthrough=no
/ip route add check-gateway=ping distance=7 gateway=pppoe-out1
/ip route add check-gateway=ping distance=7 gateway=pppoe-out1 routing-mark=gw-x11
/ip route add check-gateway=ping distance=7 gateway=pppoe-out2
/ip route add check-gateway=ping distance=7 gateway=pppoe-out2 routing-mark=gw-x12
# note the duplicated routes for LAN's
/ip route add distance=1 dst-address=172.30.163.0/24 gateway=ether2-vlan3 routing-mark=gw-x11
/ip route add distance=1 dst-address=172.30.163.0/24 gateway=ether2-vlan3 routing-mark=gw-x12
/ip route add distance=1 dst-address=172.30.214.0/24 gateway=ether2-vlan1 routing-mark=gw-x11
/ip route add distance=1 dst-address=172.30.214.0/24 gateway=ether2-vlan1 routing-mark=gw-x12
/ip route rule add action=lookup-only-in-table routing-mark=gw-x11 table=gw-x11
/ip route rule add action=lookup-only-in-table routing-mark=gw-x12 table=gw-x12
I hope it helps, and I hope a more "optimal" solution is found!
P.S.: For some reason the Mikrotik forum doesn't "like" traffic load-balanced... :)