Community discussions

MikroTik App
 
vchrizz
just joined
Topic Author
Posts: 23
Joined: Sun Jul 10, 2016 11:07 am
Location: Austria, Vienna
Contact:

CRS326 24G - 4 NICs bonded but only 2 GBit speed

Mon Aug 15, 2022 2:43 am

Hi,

I've got a Proxmox cluster out of 3x Dell servers which have 4 ports each. Using the CRS326 I am in the hope to utilize hardware offloading and bonding to use most possible bandwith for a shared ceph file system across the 3 servers.

So I connect all 4 ports of every server to the CRS326-24G-2S+ switch and configured an LACP 802.3ad bonded interface with "layer3 and 4" Transmit Hash Policy.
Then I try an iperf3 test from server3 (=10.34.0.203) or server2 (=10.34.0.202) to server1 (=10.34.0.201) with "iperf3 -P4 -c 10.34.0.201" and am hitting in sum "only" 1.87 Gbits/sec what I think is not "ok":
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   555 MBytes   465 Mbits/sec  4376             sender
[  5]   0.00-10.00  sec   553 MBytes   464 Mbits/sec                  receiver
[  7]   0.00-10.00  sec   555 MBytes   465 Mbits/sec  4895             sender
[  7]   0.00-10.00  sec   553 MBytes   464 Mbits/sec                  receiver
[  9]   0.00-10.00  sec   565 MBytes   474 Mbits/sec  4050             sender
[  9]   0.00-10.00  sec   564 MBytes   473 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec   565 MBytes   474 Mbits/sec  4238             sender
[ 11]   0.00-10.00  sec   564 MBytes   473 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  2.19 GBytes  1.88 Gbits/sec  17559             sender
[SUM]   0.00-10.00  sec  2.18 GBytes  1.87 Gbits/sec                  receiver
Same when I try from server2 to server3 or similar, never getting more than 2GBit/s.

Leaving the server configuration untouched and just connecting the four ethernet cables from server1 to server2 then retry the exact same iperf3 test, shows me that about 3,5GBit/s are possible.
That implies, that the servers are able to use all four ports of the bonded configuration. I understand that there is some overhead, so I will not reach the full 4GBit/s but 3,5GBit/s are "ok".

Could it be that I have something wrong on the switch configuration what leads to this 2GBit/s "Limit" ?

The servers have the 4 ports also configured in an LACP 802.3ad bond with hash policy 3+4:
iface bond0 inet manual
        bond-slaves eno1 eno2 eno3 eno4
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer3+4

The switch configuration is here: (I just removed private snmp configuration and obfuscated software id and serial number from the export)
# aug/15/2022 00:53:34 by RouterOS 7.4.1
# software id = ****-****
#
# model = CRS326-24G-2S+
# serial number = ************
/interface bridge
add admin-mac=DC:2C:6E:BA:B1:49 auto-mac=no comment=defconf ingress-filtering=no name=bridge protocol-mode=none vlan-filtering=yes
/interface ethernet
set [ find default-name=ether1 ] name=ether1.zi1pve1idrac
set [ find default-name=ether2 ] name=ether2.zi1pve2idrac
set [ find default-name=ether3 ] name=ether3.zi1pve3idrac
set [ find default-name=ether5 ] name=ether5.zi1pve1port1
set [ find default-name=ether6 ] name=ether6.zi1pve1port2
set [ find default-name=ether7 ] name=ether7.zi1pve1port3
set [ find default-name=ether8 ] name=ether8.zi1pve1port4
set [ find default-name=ether9 ] name=ether9.zi1pve2port1
set [ find default-name=ether10 ] name=ether10.zi1pve2port2
set [ find default-name=ether11 ] name=ether11.zi1pve2port3
set [ find default-name=ether12 ] name=ether12.zi1pve2port4
set [ find default-name=ether13 ] name=ether13.zi1pve3port1
set [ find default-name=ether14 ] name=ether14.zi1pve3port2
set [ find default-name=ether15 ] name=ether15.zi1pve3port3
set [ find default-name=ether16 ] name=ether16.zi1pve3port4
set [ find default-name=ether17 ] name=ether17.Link-to-unmgt-sw
set [ find default-name=ether18 ] name=ether18.RaspberryPi
set [ find default-name=ether24 ] advertise=10M-half,10M-full,100M-half,100M-full,1000M-half,1000M-full,10000M-full,2500M-full,5000M-full
set [ find default-name=sfp-sfpplus1 ] advertise=10M-half,10M-full,100M-half,100M-full,1000M-half,1000M-full,10000M-full,2500M-full,5000M-full
/interface vlan
add interface=bridge name=vlan1100.MGMT vlan-id=1100
/interface bonding
add mode=802.3ad name=bonding1.zi1pve1 slaves=ether5.zi1pve1port1,ether6.zi1pve1port2,ether7.zi1pve1port3,ether8.zi1pve1port4 transmit-hash-policy=layer-3-and-4
add mode=802.3ad name=bonding2.zi1pve2 slaves=ether9.zi1pve2port1,ether10.zi1pve2port2,ether11.zi1pve2port3,ether12.zi1pve2port4 transmit-hash-policy=layer-3-and-4
add mode=802.3ad name=bonding3.zi1pve3 slaves=ether13.zi1pve3port1,ether14.zi1pve3port2,ether15.zi1pve3port3,ether16.zi1pve3port4 transmit-hash-policy=layer-3-and-4
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/port
set 0 name=serial0
/interface bridge port
add bridge=bridge comment=defconf ingress-filtering=no interface=ether1.zi1pve1idrac pvid=1100
add bridge=bridge comment=defconf ingress-filtering=no interface=ether2.zi1pve2idrac pvid=1100
add bridge=bridge comment=defconf ingress-filtering=no interface=ether3.zi1pve3idrac pvid=1100
add bridge=bridge comment=defconf ingress-filtering=no interface=ether4 pvid=1100
add bridge=bridge comment=defconf ingress-filtering=no interface=ether17.Link-to-unmgt-sw
add bridge=bridge comment=defconf ingress-filtering=no interface=ether18.RaspberryPi pvid=1100
add bridge=bridge comment=defconf disabled=yes ingress-filtering=no interface=ether19
add bridge=bridge comment=defconf disabled=yes ingress-filtering=no interface=ether20
add bridge=bridge comment=defconf disabled=yes ingress-filtering=no interface=ether21
add bridge=bridge comment=defconf disabled=yes ingress-filtering=no interface=ether22
add bridge=bridge comment=defconf disabled=yes ingress-filtering=no interface=ether23
add bridge=bridge comment=defconf disabled=yes ingress-filtering=no interface=ether24
add bridge=bridge comment=defconf ingress-filtering=no interface=sfp-sfpplus1
add bridge=bridge comment=defconf disabled=yes ingress-filtering=no interface=sfp-sfpplus2
add bridge=bridge comment="bonded ports to zi1pve1" interface=bonding1.zi1pve1 pvid=1100
add bridge=bridge comment="bonded ports to zi1pve2" interface=bonding2.zi1pve2 pvid=1100
add bridge=bridge comment="bonded ports to zi1pve3" interface=bonding3.zi1pve3 pvid=1100
/ip settings
set max-neighbor-entries=8192
/ipv6 settings
set disable-ipv6=yes
/interface bridge vlan
add bridge=bridge tagged=sfp-sfpplus1,ether17.Link-to-unmgt-sw,bridge,bonding1.zi1pve1,bonding2.zi1pve2,bonding3.zi1pve3 untagged=\
    ether1.zi1pve1idrac,ether2.zi1pve2idrac,ether3.zi1pve3idrac,ether18.RaspberryPi vlan-ids=1100
add bridge=bridge tagged=sfp-sfpplus1,bonding1.zi1pve1,bonding2.zi1pve2,bonding3.zi1pve3 vlan-ids=1500
add bridge=bridge tagged=sfp-sfpplus1,bonding1.zi1pve1,bonding2.zi1pve2,bonding3.zi1pve3,ether17.Link-to-unmgt-sw vlan-ids=1660
/ip address
add address=192.168.88.1/24 comment=defconf interface=bridge network=192.168.88.0
add address=10.34.0.92/24 interface=vlan1100.MGMT network=10.34.0.0
/ip dns
set servers=9.9.9.9,1.1.1.1
/ip route
add disabled=no dst-address=0.0.0.0/0 gateway=10.34.0.100
/ip service
set telnet disabled=yes
set ftp disabled=yes
set www disabled=yes
set ssh port=10
set api disabled=yes
set api-ssl disabled=yes
/system clock
set time-zone-name=Europe/Vienna
/system identity
set name=zi1-switch2
/system ntp client
set enabled=yes
/system ntp client servers
add address=ptbtime1.ptb.de
add address=ptbtime2.ptb.de
/system routerboard settings
set boot-os=router-os
I have watched some videos on youtube where a specific configuration is mentioned to get line-speed through the switch and have read the mikrotik wiki about hardware offloading and bonding. I hope I do not miss something important about the configuration.
Thank you for any input to get this sorted why I do not get more than 2GBit/s through the switch.
 
vchrizz
just joined
Topic Author
Posts: 23
Joined: Sun Jul 10, 2016 11:07 am
Location: Austria, Vienna
Contact:

Re: CRS326 24G - 4 NICs bonded but only 2 GBit speed

Sun Aug 21, 2022 2:51 am

Could please anyone confirm or disprove?
I'm not sure if this is usual behaviour of this device because of some limitation or if it is capable for more and I just have some configuration "issue".
Or maybe I just misunderstand something obvious here?
Thank you
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 3165
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: CRS326 24G - 4 NICs bonded but only 2 GBit speed

Sun Aug 21, 2022 3:32 am

i have a CRS 328 with a bonding of 6 x 1gbit interfaces with peak hour traffic close to 5 Gbit so it is possible

you have to keep in mind, you need source/destination "variety" because traffic belonging to same connection between same source to same destination will always chose the same slave of bonding, this is to avoid packets out order
 
vchrizz
just joined
Topic Author
Posts: 23
Joined: Sun Jul 10, 2016 11:07 am
Location: Austria, Vienna
Contact:

Re: CRS326 24G - 4 NICs bonded but only 2 GBit speed

Tue Sep 27, 2022 9:09 pm

After much searching on this topic I guess it is some misunderstanding on my side.

As described here: https://help.mikrotik.com/docs/display/ ... dbalancing
Configuration
The following configuration is relevant to SW1 and SW2:

/interface bonding
add mode=802.3ad name=bond1 slaves=ether1,ether2
/interface bridge
add name=bridge1
/interface bridge port
add bridge=bridge1 interface=bond1
add bridge=bridge1 interface=sfp-sfpplus1

https://help.mikrotik.com/docs/display/ ... -Problem.2
Problem
After initial tests, you immediately notice that your network throughput never exceeds the 1Gbps limit even though the CPU load on the servers is low as well as on the network nodes (switches in this case), but the throughput is still limited to only 1Gbps. The reason behind this is because LACP (802.ad) uses transmit hash policy in order to determine if traffic can be balanced over multiple LAG members, in this case, a LAG interface does not create a 2Gbps interface, but rather an interface that can balance traffic over multiple slave interface whenever it is possible. For each packet a transmit hash is generated, this determines through which LAG member will the packet be sent, this is needed in order to avoid packets being out of order, there is an option to select the transmit hash policy, usually, there is an option to choose between Layer2 (MAC), Layer3 (IP) and Layer4 (Port), in RouterOS, this can be selected by using the transmit-hash-policy parameter. In this case, the transmit hash is the same since you are sending packets to the same destination MAC address, as well as the same IP address and Iperf uses the same port as well, this generates the same transmit hash for all packets and load balancing between LAG members is not possible. Note that not always packets will get balanced over LAG members even though the destination is different, this is because the standardized transmit hash policy can generate the same transmit hash for different destinations, for example, 192.168.0.1/192.168.0.2 will get balanced, but 192.168.0.2/192.168.0.4 will NOT get balanced in case layer2-and-3 transmit hash policy is used and the destination MAC address is the same.

https://help.mikrotik.com/docs/display/ ... Solution.2
Solution
Choose the proper transmit hash policy and test your network's throughput properly. The simplest way to test such setups is to use multiple destinations, for example, instead of sending data to just one server, rather send data to multiple servers, this will generate a different transmit hash for each packet and will make load balancing across LAG members possible.

Indeed I was trying to send data from two servers to one server to test performance. Doing the other way round, sending data from one server to two servers gave me the expected performance.
I have three proxmox virtual environment - pve - servers called pve1 pve2 pve3.

Example1:
pve1 -> pve3 - iperf3 -c 10.34.0.203 -p 5201 -P10
pve2 -> pve3 - iperf3 -c 10.34.0.203 -p 5202 -P10
gives: about max 1,8 Gbps
Image

Example2:
pve3 -> pve1 - iperf3 -c 10.34.0.201 -p 5201 -P10
pve3 -> pve2 - iperf3 -c 10.34.0.202 -p 5202 -P10
gives: about max 3,7 Gbps
Image

I understand so far, that it has to do with transmit-hash-policy - which I have set to layer3+4.
But what I don't fully understand, why in Example1 it is only about "half" of the bandwidth from the bond.

There are multiple ports used in both examples, I assumed because using transmit-hash-policy layer3+4 (four as in thinking about transport layer which works on ports).
Should Example1 not work same like Example2 ?

Can someone explain or point out where this is described in more detail? :)
Thank you!
Last edited by vchrizz on Tue Sep 27, 2022 11:18 pm, edited 1 time in total.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 13095
Joined: Thu Mar 03, 2016 10:23 pm

Re: CRS326 24G - 4 NICs bonded but only 2 GBit speed

Tue Sep 27, 2022 9:25 pm

Tx hash policies take both src and dst into account. With L3+L4 this means: dst-address, dst-port, src-address and src-port. Running 10 parallel streams you'll have 10 different src-ports which makes Tx hash policy to come up with more different hashes. As to why things are not symetrical: Tx side does the hashing and may use different policy and/or algorithm than the other side. Using 10 parallel streams doesn't guarantee to utilize all bond members (but does increase chances it to happen).