Community discussions

MikroTik App
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 09, 2017 8:23 pm

Hi,

I've setup an MPLS between 2 routers : one is a CCR1009 the other one a CHR (with a PU license just for the record).
Both are running the latest bugfix release : 6.37.5.
The link between both routers has 200M bandwidth, <1ms latency.
MTU is not an issue on this link (link MTU is 1590, MPLS MTU set to 1508 on both sides, ESXi vswitch MTU set to 7500 on jumbo enabled ports, etc...)
OSPF is up and running, LDP runs smoothly too.

Lets consider the following topology :
A --- CHR -- 200M link -- CCR1009 --- B

When I run LDP without explicit nulls I can fill the 200M link in both directions.
When I run LDP with explicit nulls I get 200M from B to A also, but the throughput from A to B decreases to a few Mbps (fluctuating around 3 to 5 Mbps depending on the number of concurent sessions).
Obviously, I've taken care of using the same explicit-nulls setting on both routers.

My interpretation is that when the CHR has to add the MPLS label (only when using explicit-nulls, otherwise he only has to route the packet, not MPLS label it) the throughput goes down.
CPU on the CHR is really low (less than 2%), memory is amply available (90% free), no single core does any significant amount of work, esxi host nics are ok too, etc...

Have you already seen this problem ?
Were you able to fix it and if so, how ?

I did some research on the forum, internet, and the changelogs, but I could find anything similar.

Any idea would be welcome !

Have a nice week-end !
 
sten
Forum Veteran
Forum Veteran
Posts: 923
Joined: Tue Jun 01, 2004 12:10 pm

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Jun 19, 2017 7:11 pm

Without explicit nulls, are any labels actually applied when it's so few hops in the link? Perhaps some fragmentation is occurring? You could try smaller packets while bandwidth testing or do a proper packet capture to see.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Jun 19, 2017 9:17 pm

Without explicit nulls, are any labels actually applied when it's so few hops in the link?
No labels applied without explicit nulls thanks to penultimate hop label popping.
Perhaps some fragmentation is occurring? You could try smaller packets while bandwidth testing or do a proper packet capture to see.
MTU was the problem before I upgraded the vSwitch MTU to 7500. MTU is now correct (and tested).
Before I changed that MTU, full packets just didn't go through.

Now they pass but slowly in one direction (when the CHR adds the MPLS label) and fast in the other one (when the CHR pops the label).
That's what puzzles me.

I'll try to see if I can setup a test environment and do proper packet capture.

Thanks for your help !
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 23, 2017 6:19 am

I have exactly the same issue here, i have labbed it up and asked MT support for help and they say check MTU and driver for nic in vmware.. this is causing major issues for me progressing a project i have on the go. please post back if you find a result?

also for info i have R1 -> R2 -> R3 -> R4. modifying the use of explicit null gets throughput up to 900Mbps on R1<->R3 however still stays low (almost 600bps) from R1 <-> R4 - this is when it imposes an MPLS label so not sure it's resolved until CHR is capable of imposing an MPLS label without reducing bandwith to almost zero!!

perhaps your masking the actual problem by setting the 'explicit null' tag to off - the actual problem is the imposing of MPLS labels as far as i can tell..

any help or suggestions gratefully appreciated :-)
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 23, 2017 3:09 pm

Which VNIC are you guys using in VMWARE. VMXNET3 or something else?
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 23, 2017 6:09 pm

VMXNET3.
It works like a charm for everything else.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Sun Jun 25, 2017 11:16 am

I have exactly the same issue here, i have labbed it up and asked MT support for help and they say check MTU and driver for nic in vmware..
Did you change the vSwitch MTU in vmware ? Did it change something for you ?
this is causing major issues for me progressing a project i have on the go. please post back if you find a result?
Of course, I will !
also for info i have R1 -> R2 -> R3 -> R4. modifying the use of explicit null gets throughput up to 900Mbps on R1<->R3 however still stays low (almost 600bps) from R1 <-> R4 - this is when it imposes an MPLS label so not sure it's resolved until CHR is capable of imposing an MPLS label without reducing bandwith to almost zero!!
Which ones of R1,R2,R3,R4 are CHR routers ? Just R4 then ?
Are they all part off the MPLS ?
perhaps your masking the actual problem by setting the 'explicit null' tag to off - the actual problem is the imposing of MPLS labels as far as i can tell..

any help or suggestions gratefully appreciated :-)
I had to put the customer in production (without explicit nulls, which will hinder some other projects at that customer, but allowed to stay within delays).
So I'll have to build another setup to run further tests, but won't have any time soon to do that.
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Jun 26, 2017 2:18 am

Okay so:
1. I have set, checked, and verified, then had someone else verify the following:
- MTU value on the physical host using Cisco UCSM - set to 9000 bytes
- MTU value on ESXi host - set to 9000 bytes
- MTU value on vSwitch in VMWare - set to 9000 bytes
2. All routers are CHR, all CHR routers are on the same physical host for testing. All routers have the same config and are all part of the MPLS.
3. setup is as follows
- R1 <- vlan10 -> R2
- R2 <- vlan20 -> R3
- R3 <- vlan30 -> R4
4. if i create a VPLS tunnel over the top of the MPLS setup from R1 to R4 it works flawlessly!

Happy to provide any more info needed :-)
Cheers
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Jun 26, 2017 2:20 am

Which VNIC are you guys using in VMWARE. VMXNET3 or something else?
I am using VMXNET3 - have tried also E1000E with no change in the result.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Wed Jun 28, 2017 9:24 pm

Which VNIC are you guys using in VMWARE. VMXNET3 or something else?
I am using VMXNET3 - have tried also E1000E with no change in the result.
Just to be sure, what version of CHR are you using ?
I tried with the new bugfix (v6.38.7) and there's still the same problem.
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jun 29, 2017 1:25 am

Which VNIC are you guys using in VMWARE. VMXNET3 or something else?
I am using VMXNET3 - have tried also E1000E with no change in the result.
Just to be sure, what version of CHR are you using ?
I tried with the new bugfix (v6.38.7) and there's still the same problem.
I have tried with versions 6.36, 6.39, and 6.39.2 - all the same result. still no response from Mikrotik support...
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jun 29, 2017 6:18 pm

I think this will make me crazy....

I just tried on v6.39.2 with the following brand-new from scratch setup.
CHR1 <-> CHR2 <-> CHR3 <-> CHR4 <-nompls-> BTEST4 (also CHR)

All links but CHR4 to BTEST4 are MPLS enabled.
All CHRx are configured with explicit-null and loop-detect, using a loopback (bridge) address as transport and LSR ID address.

When I run a bandwidth test from CHR2 to BTEST4 and do a "torch" on the CHR2 to CHR3 interface :
- outgoing packet have a MAC proto of 8847 (MPLS)
- incoming packets have MAC proto of 800 (ip)

But CHR2 is configured with explicit-nulls, so I should get incoming packets with MAC proto of 8847 too, or did I miss something ?
Packet sniffing shows the same results than torching, so I don't think this is a torch bug.

Looking at the CHR3-CHR4 interface, I see outgoing and incoming 8847 packets.
So the CHR3 router is popping the label even if I ask for explicit nulls on CHR2.

If I uncheck "Use explicit nulls" on CHR2 I get the exact same behavior.
I should have two distinct behaviors if I check or uncheck explicit nulls no ?

I'll try upgrading CHR2 and 3 to rc just to see if it changes anything and keep you posted
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jun 29, 2017 7:02 pm

Upgrading to RC or going back to bugfix doesn't change anything.
The problem doesn't seem to be version related.

Key factor seem to be the protocol :
UDP works well wether you have explicit-nulls or not,
TCP seems to be affected.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jun 29, 2017 8:01 pm

More testing :
Bandwidth test from CHR4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 5M
UDP : 5M
TCP with 2 sessions : less than 1kbps, CPU 50% showing half as unclassified in the profile tool (I opened a case for this one)

Bandwidth test from BTEST4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 100M
UDP : 99.9 Mbps/99.3 Mbps
TCP with 20 sessions : 740.9 kbps/1793.1 kbps (CPU max out on both ends)

Bandwidth test from BEST4 to CCR beyond CHR1, with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 10M
UDP : 7.9 Mbps/6.9 Mbps
TCP with 20 sessions : 7.9 Mbps/6.9 Mbps (CPU max out on BTEST4)

Further testing :
/tool fetch keep-result=no url="http://proof.ovh.net/files/1Gio.dat"
run from CHR1 , CHR2, CHR3 : several MBytes/sec
run from CHR4 : between 40 and 80 KBytes/sec

If I disable LDP on CHR1, the same command run on CHR4 get several MBytes/sec instantly, which corroborates our previous diagnostic : pushing MPLS labels on a CHR kills the performance.
And doing TCP btest on a CHR kills the CPU.

Maybe there is some special advanced setting that we need to give to vmware ?
What vmware version are you running ?
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 30, 2017 2:21 am

More testing :
Bandwidth test from CHR4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 5M
UDP : 5M
TCP with 2 sessions : less than 1kbps, CPU 50% showing half as unclassified in the profile tool (I opened a case for this one)

Bandwidth test from BTEST4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 100M
UDP : 99.9 Mbps/99.3 Mbps
TCP with 20 sessions : 740.9 kbps/1793.1 kbps (CPU max out on both ends)

Bandwidth test from BEST4 to CCR beyond CHR1, with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 10M
UDP : 7.9 Mbps/6.9 Mbps
TCP with 20 sessions : 7.9 Mbps/6.9 Mbps (CPU max out on BTEST4)

Further testing :
/tool fetch keep-result=no url="http://proof.ovh.net/files/1Gio.dat"
run from CHR1 , CHR2, CHR3 : several MBytes/sec
run from CHR4 : between 40 and 80 KBytes/sec

If I disable LDP on CHR1, the same command run on CHR4 get several MBytes/sec instantly, which corroborates our previous diagnostic : pushing MPLS labels on a CHR kills the performance.
And doing TCP btest on a CHR kills the CPU.

Maybe there is some special advanced setting that we need to give to vmware ?
What vmware version are you running ?
I am running version 6.5 at one site and version 6.0 at another (i upgraded to 6.5 to make sure it wasn't the version of VMWare causing the issue)... also this is the response from Mikrotik support:

"I have tested your exact setup, interface MTU 1500, MPLS MTU 1590 and VPLS MTU 1500
Default vswitch settings with MTU set to 9000.
We are using:
Supermicro SYS-5018D-FN8T
and ESXi-6.5.0-4564106-standard
I was able to push 900Mbps over VPLS tunnel as well as simpla label switching, so there are no problems with MPLS on CHRs or virtual interface drivers included in CHR. Problem is on your hardware and ESXi combination or vswitch settings. ESXi is known to be unstable/buggy."


I'm happy to accept the problem is on my hardware/software but i need to know what to change in order to fix it!!!
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 30, 2017 2:25 am

I tried to remove vmware TSO and HRO and reboboted the ESXi host as suggested in some vmware docs.
No change.

I setup a traffic generator to test the throughput through CHR1-CHR2-CHR3-CHR4.
I get pretty good speeds going through this whole chain in UDP, but my TCP fetch is still ultra slow.
[admin@CHR_BTEST_4] /tool traffic-generator>  quick mbps=2000
SEQ    ID      TX-PACKET   TX-RATE     RX-PACKET   RX-RATE        RX-OOO   RX-BAD-CSUM   LOST-PACKET LOST-RATE LAT-MIN LAT-AVG LAT-MAX JITTER 
......
TOT    3       2 586 149 1999.9...     2 585 538 1999.4...                           0           611 472.5kbps 44us    396us   4.32ms  4.27ms 
TOT    4       2 586 156 1999.9...     2 584 995 1999.0...                           0         1 161 897.8kbps 49.1us  416us   4.22ms  4.17ms 
TOT    TOT     5 172 305   3.9Gbps     5 170 533   3.9Gbps                           0         1 772 1370.3... 44us    406us   4.32ms  4.27ms 

[admin@CHR_BTEST_4] /tool traffic-generator> /tool fetch keep-result=no url="http://proof.ovh.net/files/1Gio.dat"
      status: downloading
  downloaded: 1427KiB
       total: 1048576KiB
    duration: 20s
I'm out of ideas...
I guess I will just have to start sniffing on all involved interfaces at the same time to try to pinpoint where the packet(s) get lost...
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 30, 2017 2:32 am

More testing :
Bandwidth test from CHR4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 5M
UDP : 5M
TCP with 2 sessions : less than 1kbps, CPU 50% showing half as unclassified in the profile tool (I opened a case for this one)

Bandwidth test from BTEST4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 100M
UDP : 99.9 Mbps/99.3 Mbps
TCP with 20 sessions : 740.9 kbps/1793.1 kbps (CPU max out on both ends)

Bandwidth test from BEST4 to CCR beyond CHR1, with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 10M
UDP : 7.9 Mbps/6.9 Mbps
TCP with 20 sessions : 7.9 Mbps/6.9 Mbps (CPU max out on BTEST4)

Further testing :
/tool fetch keep-result=no url="http://proof.ovh.net/files/1Gio.dat"
run from CHR1 , CHR2, CHR3 : several MBytes/sec
run from CHR4 : between 40 and 80 KBytes/sec

If I disable LDP on CHR1, the same command run on CHR4 get several MBytes/sec instantly, which corroborates our previous diagnostic : pushing MPLS labels on a CHR kills the performance.
And doing TCP btest on a CHR kills the CPU.

Maybe there is some special advanced setting that we need to give to vmware ?
What vmware version are you running ?
I am running version 6.5 at one site and version 6.0 at another (i upgraded to 6.5 to make sure it wasn't the version of VMWare causing the issue)... also this is the response from Mikrotik support:

"I have tested your exact setup, interface MTU 1500, MPLS MTU 1590 and VPLS MTU 1500
Default vswitch settings with MTU set to 9000.
We are using:
Supermicro SYS-5018D-FN8T
and ESXi-6.5.0-4564106-standard
I was able to push 900Mbps over VPLS tunnel as well as simpla label switching, so there are no problems with MPLS on CHRs or virtual interface drivers included in CHR. Problem is on your hardware and ESXi combination or vswitch settings. ESXi is known to be unstable/buggy."


I'm happy to accept the problem is on my hardware/software but i need to know what to change in order to fix it!!!
I am still running on ESXi 5.5. Upgrading was my best guess and last resort option (DC guy needed and downtime to be planned).
I agree that UDP works nice and VPLS is likely to be the same.
But plain TCP over MPLS just doesn't work properly.
Guess I'll have to upgrade then.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 30, 2017 2:56 am

BTW if you want to try TSO/LRO settings :
https://kb.vmware.com/selfservice/micro ... 0512464428
https://kb.vmware.com/selfservice/searc ... Id=1027511
Don't forget to reboot the host afterwards.
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jun 30, 2017 3:30 am

BTW if you want to try TSO/LRO settings :
https://kb.vmware.com/selfservice/micro ... 0512464428
https://kb.vmware.com/selfservice/searc ... Id=1027511
Don't forget to reboot the host afterwards.
Thanks for all your help with this - will give it a go and see if it makes any difference.
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Jul 04, 2017 2:22 am

BTW if you want to try TSO/LRO settings :
https://kb.vmware.com/selfservice/micro ... 0512464428
https://kb.vmware.com/selfservice/searc ... Id=1027511
Don't forget to reboot the host afterwards.
Okay i have now tried this and rebooted the host and all the CHRs with no difference at all. I think this leaves either A. an advanced setting somewhere in VMWare that i don't know about, or B) a setting in the UCS setup that VMWare is running on. are you running vmware on a cisco chassis? perhaps i could rule this out if your not and log a support case with VMWare?
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Jul 04, 2017 3:05 am

I built a test evironment last week : 4 brand-new CHR, with minimal config running, interconnected through separate vswitches with no physical interfaces (so no nic driver involved), no vlans, tested MTU up to 65500, avoided btest which is buggy with tcp on chr (already filed a support case for that), and tested with ftp fetch directly on the CHR instead.

Long story short with MPLS disabled I download the 1GiB file in 10 to 12 secs (running through the 3 router), with MPLS enabled it gets down to a few hundred kiB in the same timespan.
Interestingly enough, UDP isn't affected and runs properly with or without MPLS. It's just TCP that's affected.

I filed a support case today with the whole setups and screenshots, etc...
Took me a day to pinpoint the exact problem, show it reproductibly and fill a a proper bug report.
I hope it doesn't get thrown to trash like yours.

In the mean time I'll do some testing with KVM, just to see if I can reproduce the problem and let you know.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Jul 04, 2017 3:08 am

BTW if you want to try TSO/LRO settings :
https://kb.vmware.com/selfservice/micro ... 0512464428
https://kb.vmware.com/selfservice/searc ... Id=1027511
Don't forget to reboot the host afterwards.
Okay i have now tried this and rebooted the host and all the CHRs with no difference at all. I think this leaves either A. an advanced setting somewhere in VMWare that i don't know about, or B) a setting in the UCS setup that VMWare is running on. are you running vmware on a cisco chassis? perhaps i could rule this out if your not and log a support case with VMWare?
Nope, no UCS, we are using standalone servers, with local storage, no vCenter or dvs or anything tricky.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Jul 04, 2017 3:42 am

OK, I can reproduce it with KVM on totally different hardware.
I'm puzzled !
Does nobody use CHR to push MPLS labels ?
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Jul 04, 2017 5:57 am

OK, I can reproduce it with KVM on totally different hardware.
I'm puzzled !
Does nobody use CHR to push MPLS labels ?
apparently not.. MPLS seems to work UNLESS labels are applied! will keep trying to resolve and post if i find anything.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Jul 04, 2017 9:41 am

Some more testing this morning, on KVM since it was easier to do without side effects :
I changed the NIC types from virtio to e1000 just to make sure the problem isn't with virtio.

I get exactly the same behavior.
So the problem doesn't seem to be in the virtio driver.
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7198
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Jul 04, 2017 3:13 pm

We reproduced the issue, currently looks like problem is related to packet size, stay tuned for updates.
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Wed Jul 05, 2017 3:47 am

We reproduced the issue, currently looks like problem is related to packet size, stay tuned for updates.
YAY!!!! thanks Mikrotik (and of course Tibobo) !
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7198
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jul 06, 2017 9:35 am

Problem appears because hosts are reassembling packets into large buffers (up to 65000) to reduce CPU load, this will cause problems because MTUs are not respected.

On KVM please try to disable TSO and GSO.

On esxi try to disable TSO and LRO
https://kb.vmware.com/selfservice/micro ... Id=2055140

Problems are not observed on hyper-v.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jul 06, 2017 11:12 am

I already did it when running on ESXi5.5 U2 (see post #18).
I think tazdan did try it too, but I don't now on which version

Is this behavior also observed when using vswitches without physical nics ?

Anyway, we recently upgraded to esxi 6.5 buid 5310538.
I wil give it a try and keep you posted.
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7198
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jul 06, 2017 11:43 am

TSO, GSO and GRO need to be disabled also on guests, so you will have to wait for new CHR build.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jul 06, 2017 12:09 pm

On ESXi host :
login as: root
Using keyboard-interactive authentication.
Password:
The time and date of this login have been sent to the system logs.

VMware offers supported, powerful system administration tools.  Please
see www.vmware.com/go/sysadmintools for details.

The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
[root@ESX-BGP2:~] vmware -v
VMware ESXi 6.5.0 build-5310538
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/UseHwTSO
   Path: /Net/UseHwTSO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: When non-zero, use pNIC HW TSO offload if available
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/UseHwTSO6
   Path: /Net/UseHwTSO6
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: When non-zero, use pNIC HW IPv6 TSO offload if available
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet2HwLRO
   Path: /Net/Vmxnet2HwLRO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to perform HW LRO on pkts going to a LPD capable vmxnet2
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet3HwLRO
   Path: /Net/Vmxnet3HwLRO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to enable HW LRO on pkts going to a LPD capable vmxnet3
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/TcpipDefLROEnable
d
   Path: /Net/TcpipDefLROEnabled
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: LRO enabled for TCP/IP
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet2SwLRO
   Path: /Net/Vmxnet2SwLRO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to perform SW LRO on pkts going to a LPD capable vmxnet2
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet3SwLRO
   Path: /Net/Vmxnet3SwLRO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to perform SW LRO on pkts going to a LPD capable vmxnet3
[root@ESX-BGP2:~]
On CHR_BTEST4 :
[admin@CHR_BTEST_4] > /tool fetch keep-result=no url="ftp://100.65.2.1/1Gio.dat" user=admin password=""
      status: downloading
  downloaded: 192KiB
    duration: 13s

[admin@CHR_BTEST_4] > /tool fetch keep-result=no url="ftp://100.65.0.3/1Gio.dat" user=admin password=""
      status: finished
  downloaded: 1048576KiB
    duration: 13s

[admin@CHR_BTEST_4] > 
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jul 06, 2017 12:13 pm

TSO, GSO and GRO need to be disabled also on guests, so you will have to wait for new CHR build.
OK, that explains it.
Do you know it it ill make it in bugfix ?
And if not, will it be in the next current ?
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7198
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Jul 06, 2017 4:10 pm

At first it will be in RC version, then it is possible that change will be pushed to bugfix also.
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Jul 07, 2017 2:20 am

At first it will be in RC version, then it is possible that change will be pushed to bugfix also.
Excellent - could you give us an idea of when this may be released? or are we able to get an advance copy perhaps for testing? and my vmware Version for testing is 6.5.0 Build 5318154.

cheers
Dan.
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Wed Sep 27, 2017 1:17 am

At first it will be in RC version, then it is possible that change will be pushed to bugfix also.
Any news ? Did I miss that in the changelogs ?

Thanks !
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Wed Sep 27, 2017 5:34 am

At first it will be in RC version, then it is possible that change will be pushed to bugfix also.
Any news ? Did I miss that in the changelogs ?

Thanks !
I followed up with Maris, and below is the response :-)

-----Original Message-----
From: Maris (MikroTik Support) [mailto:support@mikrotik.com]
Sent: Thursday, 14 September 2017 6:29 PM
To: Dan French
Subject: Re: [Ticket#2017060822000545] CHR MPLS and MTU

Hello,

Currently it is not yet fixed, unfortunately I cannot tell when exactly it will happen.

Best regards,
Maris
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Wed Sep 27, 2017 9:18 am

Thanks Tazdan.
I can't understand what could take so long.
It really looks like a simple switch to change.
For now CHR is basically unusable for MPLS …
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Sep 28, 2017 2:36 am

My thoughts exactly :-) - hopefully the wait won't be too much longer
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Sat Sep 30, 2017 2:00 pm

Great troubleshooting work guys! I'm anxious to see the results of this fix as we have been planning to use CHR for a number of MPLS applications.
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Sun Nov 05, 2017 5:31 pm

any update on this MikroTIk?
 
tibobo
newbie
Topic Author
Posts: 41
Joined: Tue Sep 27, 2016 8:54 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Jan 09, 2018 12:33 pm

Mikrotik ? Please ?
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Feb 01, 2018 7:36 pm

Any updates on this MikroTik? I've been holding off on deploying CHR for MPLS because of this and would love to see this fixed.
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Feb 02, 2018 3:08 am

Any updates on this MikroTik? I've been holding off on deploying CHR for MPLS because of this and would love to see this fixed.
Same here - been waiting for a LONG TIME!! hopefully an update soon...
 
nickdwhite
just joined
Posts: 11
Joined: Thu Jun 22, 2006 11:41 pm

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Wed Mar 21, 2018 11:58 pm

Any updates on this?
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Mar 26, 2018 3:31 am

Any updates on this?
Not that I have read in the release notes, and not that Mikrotik have told me about :( that being said I haven't actually re-run the tests for quite some time. perhaps I'll get a chance soon to try again and see if anything has changed.
 
SamWCL
Frequent Visitor
Frequent Visitor
Posts: 75
Joined: Mon Apr 20, 2009 1:18 pm
Location: Nelson, NZ

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Mar 26, 2018 7:44 am

BUMP
 
nickdwhite
just joined
Posts: 11
Joined: Thu Jun 22, 2006 11:41 pm

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Apr 06, 2018 5:35 pm

I've been running an 8-router Mikrotik lab (on ESXI) for several weeks, and can confirm this is still broken in 6.41.3. I have not tried the latest release candidate though (6.42rc52) - I might give that a test tonight.
 
nickdwhite
just joined
Posts: 11
Joined: Thu Jun 22, 2006 11:41 pm

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Apr 19, 2018 3:56 pm

I moved my lab to Hyper-V Core 2012 R2, and can confirm that MPLS runs fine on that.
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Sat Apr 21, 2018 3:28 am

I moved my lab to Hyper-V Core 2012 R2, and can confirm that MPLS runs fine on that.

That's great info...we did a bunch of CHR testing on different hypervisors for BGP and presented the results in Berlin at MUM Europe 2018. Hyper-V was way better than ESXi and ProxMox (KVM) by a significant margin.
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7198
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Apr 23, 2018 10:13 am

Hyper-V works because it does not assemble packets into 64k buffers. But this assembly happens only for traffic which source and destination is also virtual guest. If destination is physical router outside VM environment then there should be no problem with MPLS.
 
tazdan
just joined
Posts: 16
Joined: Tue Nov 19, 2013 2:09 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Apr 24, 2018 1:29 am

Hyper-V works because it does not assemble packets into 64k buffers. But this assembly happens only for traffic which source and destination is also virtual guest. If destination is physical router outside VM environment then there should be no problem with MPLS.
Thanks for the update - So are Mikrotik working on a solution for this or is this something VMWare need to change? what can we do to make it work? without changing an entire VMWare infrastructure to Hyper-V that is!!
 
nickdwhite
just joined
Posts: 11
Joined: Thu Jun 22, 2006 11:41 pm

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Thu Apr 26, 2018 7:15 pm

Hyper-V works because it does not assemble packets into 64k buffers. But this assembly happens only for traffic which source and destination is also virtual guest. If destination is physical router outside VM environment then there should be no problem with MPLS.

Are there multiple issues here? Are you saying that the speed issue should not exist if you simply have a single CHR per host on ESXI? I had 4 VM routers on 4 physical hosts (ESXI) all daisy-chained and was still seeing the speed issue running from R1 to R4.

R1 <-> R2 <-> R3 <-> R4
(each of these is a separate Supermicro server with a single CHR VM installed)
 
aussiewan
newbie
Posts: 26
Joined: Wed Sep 07, 2011 5:28 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Jul 23, 2018 8:04 am

Are there any updates on this issue? In particular, have there been any improvements since RouterOS 6.42, which has a heap of hypervisor integration improvements?
 
aussiewan
newbie
Posts: 26
Joined: Wed Sep 07, 2011 5:28 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Wed Jul 25, 2018 2:19 am

For those following, I emailed support and received the following response:
As far as I can tell problem is reported and in TODO list, but when exactly it will be resolved I cannot tell.
One of the best working hypervisors with least amount of problems is hyper-v, if this MPLS problem is really big issue for you then you might try to switch to hyper-v.
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Aug 07, 2018 8:12 pm

Thanks for the update!
 
jkat
just joined
Posts: 1
Joined: Sun May 06, 2018 11:23 pm

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Jan 07, 2019 1:15 pm

Any update on this ?
 
konstantinJFK
newbie
Posts: 25
Joined: Wed Mar 08, 2017 3:44 pm
Location: Milan, Italy
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri May 10, 2019 12:36 am

Hyper-V works because it does not assemble packets into 64k buffers. But this assembly happens only for traffic which source and destination is also virtual guest. If destination is physical router outside VM environment then there should be no problem with MPLS.
I can confirm the issue is still here with 6.43.13 build.

workaround to disable large receive offload (LRO) for the whole host DOES NOT WORK!

https://kb.vmware.com/s/article/2055140


What to do?
 
jbaird
newbie
Posts: 48
Joined: Tue May 10, 2011 6:11 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Fri Nov 01, 2019 4:00 pm

Does anyone know if this has been fixed? I had planned on rolling ESXi for a pair of VPLS aggregation routers, but it sounds like I may need to consider Hyper-V instead?
 
User avatar
IPAsupport
Frequent Visitor
Frequent Visitor
Posts: 62
Joined: Fri Sep 20, 2019 4:02 pm

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue Nov 05, 2019 3:45 pm

I'd use HyperV....I haven't seen any notification this has been fixed.

I've heard other people mention that ProxMox with Open Vswitch works but I haven't tested or confirmed that.
 
aussiewan
newbie
Posts: 26
Joined: Wed Sep 07, 2011 5:28 am

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue May 12, 2020 6:19 am

Hi all,

The latest stable release, 6.45.9, includes the following note:
*) system - correctly handle Generic Receive Offloading (GRO) for MPLS traffic;

Does anyone know if this fixes the issue covered in this thread? I don't have time to lab anything up for the moment to test.

Regards,
Philip
 
User avatar
nz_monkey
Forum Guru
Forum Guru
Posts: 2190
Joined: Mon Jan 14, 2008 1:53 pm
Location: Over the Rainbow
Contact:

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Tue May 12, 2020 8:31 am

Hi all,

The latest stable release, 6.45.9, includes the following note:
*) system - correctly handle Generic Receive Offloading (GRO) for MPLS traffic;

Does anyone know if this fixes the issue covered in this thread? I don't have time to lab anything up for the moment to test.

Regards,
Philip

I am curious, but am also in the same predicament with time :(


Kevin/Derek @ IPArchitechs have you guys had a chance to test this in your lab yet ?
 
User avatar
IPAsupport
Frequent Visitor
Frequent Visitor
Posts: 62
Joined: Fri Sep 20, 2019 4:02 pm

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Wed Sep 02, 2020 3:19 am

We haven´t tested that yet, but as soon as we do, we'll share the results
 
mducharme
Trainer
Trainer
Posts: 1777
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Mon Apr 05, 2021 2:45 am

I am testing this - I am seeing promising results but still some weird behaviour.

When running a TCP btest on a hardware router (1100ahx2) going across an MPLS network to a CHR, I'm seeing full rates for send and receive.
When I run the btest on the CHR against the same 1100ahx2 as last, I get full rates for receive but only 2-3Mbps for send. That is really bizarre to me - why does btest show full rates when on receive from CHR to hardware but not on send from CHR to hardware since it is in the same direction? It seems the behaviour changes depending on what side initiated the btest.

VPLS seems fine from hardware to CHR, I can get full rates across the VPLS tunnel regardless of which side initiates the btest.

Things get even stranger with two CHR's routed to each other through a CCR, ex. CHR1 <--> CCR <--> CHR2. When running TCP btest on CHR2 against CHR1, I am seeing 1Gbps receive and ~3Mbps send. When I run btest on CHR1 against CHR2, I also see 1Gbps receive and ~3Mbps send. Whether the traffic is going from CHR1 to CHR2 or vice versa doesn't seem to matter, the only thing that seems to matter is sending is always slow from the CHR that initiated the btest but receiving is always fast.

Also, I tried connecting two CHR's via VPLS tunnel. It doesn't seem to pass traffic other than neighbor discovery - adding IPs to both sides of the tunnel and trying to ping the far side gives no response and ARP does not complete (but the dynamic entry without the C appears in the arp table with the far side mac). So CHR to CHR VPLS does not seem to work at all.

I have tried changing the use-explicit-null setting on both CHR's and it does not change this behavior.

Update: I figured out the reason for the different send/receive behaviour - I have advertise filters on MPLS so that only the loopbacks are advertised for tunnel purposes. Btest doesn't allow specifying the source interface so depending on the direction it is initiated in, the traffic may or may not have labels applied. When it is a label pushed, it is slow, so the same problem described before seems to exist.

So it looks like it is *not* fixed, but maybe I still have to disable LRO and TSO in the ESXi host. I will try that next.
 
User avatar
bajodel
Long time Member
Long time Member
Posts: 552
Joined: Sun Nov 24, 2013 8:30 am
Location: Italy

Re: MPLS - massive throughput difference on CHR when using explicit nulls

Sun Jan 16, 2022 1:05 pm

Any update on this? I'm labbing with MPLS on CHRs (1Mb up limit) and I cannot figure it out. Thnx