No labels applied without explicit nulls thanks to penultimate hop label popping.Without explicit nulls, are any labels actually applied when it's so few hops in the link?
MTU was the problem before I upgraded the vSwitch MTU to 7500. MTU is now correct (and tested).Perhaps some fragmentation is occurring? You could try smaller packets while bandwidth testing or do a proper packet capture to see.
Did you change the vSwitch MTU in vmware ? Did it change something for you ?I have exactly the same issue here, i have labbed it up and asked MT support for help and they say check MTU and driver for nic in vmware..
Of course, I will !this is causing major issues for me progressing a project i have on the go. please post back if you find a result?
Which ones of R1,R2,R3,R4 are CHR routers ? Just R4 then ?also for info i have R1 -> R2 -> R3 -> R4. modifying the use of explicit null gets throughput up to 900Mbps on R1<->R3 however still stays low (almost 600bps) from R1 <-> R4 - this is when it imposes an MPLS label so not sure it's resolved until CHR is capable of imposing an MPLS label without reducing bandwith to almost zero!!
I had to put the customer in production (without explicit nulls, which will hinder some other projects at that customer, but allowed to stay within delays).perhaps your masking the actual problem by setting the 'explicit null' tag to off - the actual problem is the imposing of MPLS labels as far as i can tell..
any help or suggestions gratefully appreciated
I am using VMXNET3 - have tried also E1000E with no change in the result.Which VNIC are you guys using in VMWARE. VMXNET3 or something else?
Just to be sure, what version of CHR are you using ?I am using VMXNET3 - have tried also E1000E with no change in the result.Which VNIC are you guys using in VMWARE. VMXNET3 or something else?
I have tried with versions 6.36, 6.39, and 6.39.2 - all the same result. still no response from Mikrotik support...Just to be sure, what version of CHR are you using ?I am using VMXNET3 - have tried also E1000E with no change in the result.Which VNIC are you guys using in VMWARE. VMXNET3 or something else?
I tried with the new bugfix (v6.38.7) and there's still the same problem.
I am running version 6.5 at one site and version 6.0 at another (i upgraded to 6.5 to make sure it wasn't the version of VMWare causing the issue)... also this is the response from Mikrotik support:More testing :
Bandwidth test from CHR4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 5M
UDP : 5M
TCP with 2 sessions : less than 1kbps, CPU 50% showing half as unclassified in the profile tool (I opened a case for this one)
Bandwidth test from BTEST4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 100M
UDP : 99.9 Mbps/99.3 Mbps
TCP with 20 sessions : 740.9 kbps/1793.1 kbps (CPU max out on both ends)
Bandwidth test from BEST4 to CCR beyond CHR1, with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 10M
UDP : 7.9 Mbps/6.9 Mbps
TCP with 20 sessions : 7.9 Mbps/6.9 Mbps (CPU max out on BTEST4)
Further testing :
/tool fetch keep-result=no url="http://proof.ovh.net/files/1Gio.dat"
run from CHR1 , CHR2, CHR3 : several MBytes/sec
run from CHR4 : between 40 and 80 KBytes/sec
If I disable LDP on CHR1, the same command run on CHR4 get several MBytes/sec instantly, which corroborates our previous diagnostic : pushing MPLS labels on a CHR kills the performance.
And doing TCP btest on a CHR kills the CPU.
Maybe there is some special advanced setting that we need to give to vmware ?
What vmware version are you running ?
[admin@CHR_BTEST_4] /tool traffic-generator> quick mbps=2000
SEQ ID TX-PACKET TX-RATE RX-PACKET RX-RATE RX-OOO RX-BAD-CSUM LOST-PACKET LOST-RATE LAT-MIN LAT-AVG LAT-MAX JITTER
......
TOT 3 2 586 149 1999.9... 2 585 538 1999.4... 0 611 472.5kbps 44us 396us 4.32ms 4.27ms
TOT 4 2 586 156 1999.9... 2 584 995 1999.0... 0 1 161 897.8kbps 49.1us 416us 4.22ms 4.17ms
TOT TOT 5 172 305 3.9Gbps 5 170 533 3.9Gbps 0 1 772 1370.3... 44us 406us 4.32ms 4.27ms
[admin@CHR_BTEST_4] /tool traffic-generator> /tool fetch keep-result=no url="http://proof.ovh.net/files/1Gio.dat"
status: downloading
downloaded: 1427KiB
total: 1048576KiB
duration: 20s
I am still running on ESXi 5.5. Upgrading was my best guess and last resort option (DC guy needed and downtime to be planned).I am running version 6.5 at one site and version 6.0 at another (i upgraded to 6.5 to make sure it wasn't the version of VMWare causing the issue)... also this is the response from Mikrotik support:More testing :
Bandwidth test from CHR4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 5M
UDP : 5M
TCP with 2 sessions : less than 1kbps, CPU 50% showing half as unclassified in the profile tool (I opened a case for this one)
Bandwidth test from BTEST4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 100M
UDP : 99.9 Mbps/99.3 Mbps
TCP with 20 sessions : 740.9 kbps/1793.1 kbps (CPU max out on both ends)
Bandwidth test from BEST4 to CCR beyond CHR1, with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 10M
UDP : 7.9 Mbps/6.9 Mbps
TCP with 20 sessions : 7.9 Mbps/6.9 Mbps (CPU max out on BTEST4)
Further testing :
/tool fetch keep-result=no url="http://proof.ovh.net/files/1Gio.dat"
run from CHR1 , CHR2, CHR3 : several MBytes/sec
run from CHR4 : between 40 and 80 KBytes/sec
If I disable LDP on CHR1, the same command run on CHR4 get several MBytes/sec instantly, which corroborates our previous diagnostic : pushing MPLS labels on a CHR kills the performance.
And doing TCP btest on a CHR kills the CPU.
Maybe there is some special advanced setting that we need to give to vmware ?
What vmware version are you running ?
"I have tested your exact setup, interface MTU 1500, MPLS MTU 1590 and VPLS MTU 1500
Default vswitch settings with MTU set to 9000.
We are using:
Supermicro SYS-5018D-FN8T
and ESXi-6.5.0-4564106-standard
I was able to push 900Mbps over VPLS tunnel as well as simpla label switching, so there are no problems with MPLS on CHRs or virtual interface drivers included in CHR. Problem is on your hardware and ESXi combination or vswitch settings. ESXi is known to be unstable/buggy."
I'm happy to accept the problem is on my hardware/software but i need to know what to change in order to fix it!!!
Thanks for all your help with this - will give it a go and see if it makes any difference.BTW if you want to try TSO/LRO settings :
https://kb.vmware.com/selfservice/micro ... 0512464428
https://kb.vmware.com/selfservice/searc ... Id=1027511
Don't forget to reboot the host afterwards.
Okay i have now tried this and rebooted the host and all the CHRs with no difference at all. I think this leaves either A. an advanced setting somewhere in VMWare that i don't know about, or B) a setting in the UCS setup that VMWare is running on. are you running vmware on a cisco chassis? perhaps i could rule this out if your not and log a support case with VMWare?BTW if you want to try TSO/LRO settings :
https://kb.vmware.com/selfservice/micro ... 0512464428
https://kb.vmware.com/selfservice/searc ... Id=1027511
Don't forget to reboot the host afterwards.
Nope, no UCS, we are using standalone servers, with local storage, no vCenter or dvs or anything tricky.Okay i have now tried this and rebooted the host and all the CHRs with no difference at all. I think this leaves either A. an advanced setting somewhere in VMWare that i don't know about, or B) a setting in the UCS setup that VMWare is running on. are you running vmware on a cisco chassis? perhaps i could rule this out if your not and log a support case with VMWare?BTW if you want to try TSO/LRO settings :
https://kb.vmware.com/selfservice/micro ... 0512464428
https://kb.vmware.com/selfservice/searc ... Id=1027511
Don't forget to reboot the host afterwards.
apparently not.. MPLS seems to work UNLESS labels are applied! will keep trying to resolve and post if i find anything.OK, I can reproduce it with KVM on totally different hardware.
I'm puzzled !
Does nobody use CHR to push MPLS labels ?
YAY!!!! thanks Mikrotik (and of course Tibobo) !We reproduced the issue, currently looks like problem is related to packet size, stay tuned for updates.
login as: root
Using keyboard-interactive authentication.
Password:
The time and date of this login have been sent to the system logs.
VMware offers supported, powerful system administration tools. Please
see www.vmware.com/go/sysadmintools for details.
The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
[root@ESX-BGP2:~] vmware -v
VMware ESXi 6.5.0 build-5310538
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/UseHwTSO
Path: /Net/UseHwTSO
Type: integer
Int Value: 0
Default Int Value: 1
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: When non-zero, use pNIC HW TSO offload if available
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/UseHwTSO6
Path: /Net/UseHwTSO6
Type: integer
Int Value: 0
Default Int Value: 1
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: When non-zero, use pNIC HW IPv6 TSO offload if available
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet2HwLRO
Path: /Net/Vmxnet2HwLRO
Type: integer
Int Value: 0
Default Int Value: 1
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: Whether to perform HW LRO on pkts going to a LPD capable vmxnet2
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet3HwLRO
Path: /Net/Vmxnet3HwLRO
Type: integer
Int Value: 0
Default Int Value: 1
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: Whether to enable HW LRO on pkts going to a LPD capable vmxnet3
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/TcpipDefLROEnable
d
Path: /Net/TcpipDefLROEnabled
Type: integer
Int Value: 0
Default Int Value: 1
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: LRO enabled for TCP/IP
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet2SwLRO
Path: /Net/Vmxnet2SwLRO
Type: integer
Int Value: 0
Default Int Value: 1
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: Whether to perform SW LRO on pkts going to a LPD capable vmxnet2
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet3SwLRO
Path: /Net/Vmxnet3SwLRO
Type: integer
Int Value: 0
Default Int Value: 1
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: Whether to perform SW LRO on pkts going to a LPD capable vmxnet3
[root@ESX-BGP2:~]
[admin@CHR_BTEST_4] > /tool fetch keep-result=no url="ftp://100.65.2.1/1Gio.dat" user=admin password=""
status: downloading
downloaded: 192KiB
duration: 13s
[admin@CHR_BTEST_4] > /tool fetch keep-result=no url="ftp://100.65.0.3/1Gio.dat" user=admin password=""
status: finished
downloaded: 1048576KiB
duration: 13s
[admin@CHR_BTEST_4] >
OK, that explains it.TSO, GSO and GRO need to be disabled also on guests, so you will have to wait for new CHR build.
Excellent - could you give us an idea of when this may be released? or are we able to get an advance copy perhaps for testing? and my vmware Version for testing is 6.5.0 Build 5318154.At first it will be in RC version, then it is possible that change will be pushed to bugfix also.
Any news ? Did I miss that in the changelogs ?At first it will be in RC version, then it is possible that change will be pushed to bugfix also.
I followed up with Maris, and below is the responseAny news ? Did I miss that in the changelogs ?At first it will be in RC version, then it is possible that change will be pushed to bugfix also.
Thanks !
Same here - been waiting for a LONG TIME!! hopefully an update soon...Any updates on this MikroTik? I've been holding off on deploying CHR for MPLS because of this and would love to see this fixed.
Not that I have read in the release notes, and not that Mikrotik have told me about that being said I haven't actually re-run the tests for quite some time. perhaps I'll get a chance soon to try again and see if anything has changed.Any updates on this?
I moved my lab to Hyper-V Core 2012 R2, and can confirm that MPLS runs fine on that.
Thanks for the update - So are Mikrotik working on a solution for this or is this something VMWare need to change? what can we do to make it work? without changing an entire VMWare infrastructure to Hyper-V that is!!Hyper-V works because it does not assemble packets into 64k buffers. But this assembly happens only for traffic which source and destination is also virtual guest. If destination is physical router outside VM environment then there should be no problem with MPLS.
Hyper-V works because it does not assemble packets into 64k buffers. But this assembly happens only for traffic which source and destination is also virtual guest. If destination is physical router outside VM environment then there should be no problem with MPLS.
As far as I can tell problem is reported and in TODO list, but when exactly it will be resolved I cannot tell.
One of the best working hypervisors with least amount of problems is hyper-v, if this MPLS problem is really big issue for you then you might try to switch to hyper-v.
I can confirm the issue is still here with 6.43.13 build.Hyper-V works because it does not assemble packets into 64k buffers. But this assembly happens only for traffic which source and destination is also virtual guest. If destination is physical router outside VM environment then there should be no problem with MPLS.
Hi all,
The latest stable release, 6.45.9, includes the following note:
*) system - correctly handle Generic Receive Offloading (GRO) for MPLS traffic;
Does anyone know if this fixes the issue covered in this thread? I don't have time to lab anything up for the moment to test.
Regards,
Philip