Community discussions

MikroTik App
 
serjrd
just joined
Topic Author
Posts: 15
Joined: Mon Nov 30, 2015 3:24 pm

PPPoE + Shaping for 10000+ clients on CCR-1072

Wed Dec 09, 2015 2:05 pm

Hi,

I want to try CCR1072 as a pppoe server with shaping for ~10k simultaneous connections.
There's roughly 10 different possible data rates that a client might get (e.g. 2Mb/s, 20Mb/s, 40Mb/s, 60Mb/s, 100Mb/s)

After reading the Wiki I'm still confused about what kind of shaping to implement for large number of clients.

The most obvious way is to rely on the default way where RADIUS provides the desired speed and CCR creates a dynamic simple queue for each connection.

But the documentation says that PCQ is better suited for large number of clients. But in this case I don't really see a non-hacky way to assign the right speed to a client.

Could someone share wisdom?
Thanks!

P.S.:
Is there a good way to test a large number of PPPoE connections in a lab?
 
User avatar
pukkita
Trainer
Trainer
Posts: 3051
Joined: Wed Dec 04, 2013 11:09 am
Location: Spain

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Wed Dec 09, 2015 2:49 pm

Hi,

I want to try CCR1072 as a pppoe server with shaping for ~10k simultaneous connections.
There's roughly 10 different possible data rates that a client might get (e.g. 2Mb/s, 20Mb/s, 40Mb/s, 60Mb/s, 100Mb/s)

After reading the Wiki I'm still confused about what kind of shaping to implement for large number of clients.

The most obvious way is to rely on the default way where RADIUS provides the desired speed and CCR creates a dynamic simple queue for each connection.

But the documentation says that PCQ is better suited for large number of clients. But in this case I don't really see a non-hacky way to assign the right speed to a client.
That was on ROS v5 era. Simple queues are now optimized for multi-core devices.
 
serjrd
just joined
Topic Author
Posts: 15
Joined: Mon Nov 30, 2015 3:24 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Wed Dec 09, 2015 2:59 pm

Thanks, pukkita

And is there a way to dynamically change data rate limit for a connection that got a dynamic queue based on RADIUS attribute?
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Wed Dec 09, 2015 8:20 pm

We are still working on the scripting to allow us to test bandwidth at a scale over 10,000 connections, but virtual machines are a must if you want to test with those numbers and you'll need a decent amount of RAM and 10 gig cards if you want to stress the 1072.

Here is a quick look at what we have done so far with PPPoE:

http://www.stubarea51.net/2015/10/23/mi ... nd-queues/
 
serjrd
just joined
Topic Author
Posts: 15
Joined: Mon Nov 30, 2015 3:24 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Thu Dec 10, 2015 11:01 am

Thanks IPANetEngineer,

I did read your blog regarding mikrotik stress-test. Probably your 80G bw post was the reason we decided to buy a 1072 in the first place. So thanks for your efforts, I hope we will enjoy replacing our Cisco ASR1004 with MT1072 once all the testing is over.

Btw, why exactly do you need virtual machines to do the testing? Isn't it just addition load for the CPU?
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 3164
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Thu Dec 10, 2015 7:12 pm

Thanks IPANetEngineer,

I did read your blog regarding mikrotik stress-test. Probably your 80G bw post was the reason we decided to buy a 1072 in the first place. So thanks for your efforts, I hope we will enjoy replacing our Cisco ASR1004 with MT1072 once all the testing is over.

Btw, why exactly do you need virtual machines to do the testing? Isn't it just addition load for the CPU?

im curious whats the motivation to replace asr1004 with ccr1072??
 
User avatar
pukkita
Trainer
Trainer
Posts: 3051
Joined: Wed Dec 04, 2013 11:09 am
Location: Spain

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Thu Dec 10, 2015 8:01 pm

I'd bet one of them is power consumption... :D
 
serjrd
just joined
Topic Author
Posts: 15
Joined: Mon Nov 30, 2015 3:24 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Thu Dec 10, 2015 8:40 pm

im curious whats the motivation to replace asr1004 with ccr1072??
Oh, it's quite simple. ASR was quite pricey when we were purchasing it in the first place. And now we're about to hit the 10G limit, which means another pricey upgrade.

If the ccr1072 experiment works out, I'd be happy to sell our ASR and live happily ever after :)
 
doush
Long time Member
Long time Member
Posts: 665
Joined: Thu Jun 04, 2009 3:11 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Fri Dec 11, 2015 12:45 pm

You shouldnt put more than 1000 users on a single CCR. You need 10+ CCRs (either 1036 or 1072 wont matter) to have 10k+ users terminated.

RouterOS is simply not multi-threaded enough for such a scenario.
 
serjrd
just joined
Topic Author
Posts: 15
Joined: Mon Nov 30, 2015 3:24 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Fri Dec 11, 2015 12:58 pm

You shouldnt put more than 1000 users on a single CCR. You need 10+ CCRs (either 1036 or 1072 wont matter) to have 10k+ users terminated.

RouterOS is simply not multi-threaded enough for such a scenario.
Really? Is that your personal experience?
Because before actually buying a CCR I contacted the official mikrotik support and asked if such loads can be handled. They replied that one or two CCR1072 should be able to handle ~15k pppoe simultaneous connections with decent traffic.

I'd love to hear some official comment.

But anyway we're planning to test it live some time next week and see how many actual, live sessions it can handle.

Btw, is there anything specific we should do to optimize for such loads? Coz right now we're pretty much planning to go with just making our RADIUS provide Mikrotik specific fields to do the shaping.
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Fri Dec 11, 2015 4:50 pm

Thanks IPANetEngineer,

I did read your blog regarding mikrotik stress-test. Probably your 80G bw post was the reason we decided to buy a 1072 in the first place. So thanks for your efforts, I hope we will enjoy replacing our Cisco ASR1004 with MT1072 once all the testing is over.

Btw, why exactly do you need virtual machines to do the testing? Isn't it just addition load for the CPU?

im curious whats the motivation to replace asr1004 with ccr1072??
We have worked with several Telcos that use ASRs as their BRAS and I have to say for a router that can cost upwards of $100,000 with licensing and modules, I was not very impressed with the amount of PPPoE traffic it could handle vs. a CCR.

We started to see CPU spiking at 15k connections on an ASR1002 and it was supposed to be rated at more than 25k for the configuration we had according to Cisco TAC and the release notes.
 
User avatar
StubArea51
Trainer
Trainer
Posts: 1742
Joined: Fri Aug 10, 2012 6:46 am
Location: stubarea51.net
Contact:

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Fri Dec 11, 2015 4:55 pm

Thanks IPANetEngineer,

I did read your blog regarding mikrotik stress-test. Probably your 80G bw post was the reason we decided to buy a 1072 in the first place. So thanks for your efforts, I hope we will enjoy replacing our Cisco ASR1004 with MT1072 once all the testing is over.

Btw, why exactly do you need virtual machines to do the testing? Isn't it just addition load for the CPU?
Glad to hear you found the information helpful :-)

Virtual machines allow us to build just about any network topology quickly without having to physically rework the lab. Also, some processes are more efficient with x86 than on a CCR, so we use MikroTik VMs in addition to CentOS to build whatever environment is needed for the test.

Also, we do a lot of work integrating MikroTIk with Cisco /Juniper/etc in Data Centers and Service Providers, so it's helpful to be able to spin up a Cisco or Juniper VM to test the design for a network integration.

Currently we have 4 VMWare ESXi 6.x hosts in our lab that can generate up to 80 Gbps of traffic collectively.
 
doush
Long time Member
Long time Member
Posts: 665
Joined: Thu Jun 04, 2009 3:11 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Sat Dec 12, 2015 11:33 am

You shouldnt put more than 1000 users on a single CCR. You need 10+ CCRs (either 1036 or 1072 wont matter) to have 10k+ users terminated.

RouterOS is simply not multi-threaded enough for such a scenario.
Really? Is that your personal experience?
Because before actually buying a CCR I contacted the official mikrotik support and asked if such loads can be handled. They replied that one or two CCR1072 should be able to handle ~15k pppoe simultaneous connections with decent traffic.

I'd love to hear some official comment.

But anyway we're planning to test it live some time next week and see how many actual, live sessions it can handle.

Btw, is there anything specific we should do to optimize for such loads? Coz right now we're pretty much planning to go with just making our RADIUS provide Mikrotik specific fields to do the shaping.
[admin@PPPOE SERVER 1] /ppp active> pr count-only 
1387
[admin@PPPOE SERVER 1] /queue simple> pr count-only 
1377
[admin@PPPOE SERVER 1] /ip firewall filter> pr count-only 
11
[admin@PPPOE SERVER 1] /interface> mon ether1
                        name:     ether1
       rx-packets-per-second:     50 755
          rx-bits-per-second:  417.0Mbps
    fp-rx-packets-per-second:     50 755
       fp-rx-bits-per-second:  417.0Mbps
         rx-drops-per-second:          0
        rx-errors-per-second:          0
       tx-packets-per-second:     47 972
          tx-bits-per-second:  190.7Mbps
    fp-tx-packets-per-second:          0
       fp-tx-bits-per-second:       0bps
         tx-drops-per-second:          0
        tx-errors-per-second:          0
[admin@PPPOE SERVER 1] /system resource cpu> pr terse 
 0 cpu=cpu0 load=13% irq=13% disk=0% 
 1 cpu=cpu1 load=22% irq=22% disk=0% 
 2 cpu=cpu2 load=7% irq=7% disk=0% 
 3 cpu=cpu3 load=15% irq=15% disk=0% 
 4 cpu=cpu4 load=7% irq=7% disk=0% 
 5 cpu=cpu5 load=11% irq=5% disk=0% 
 6 cpu=cpu6 load=18% irq=17% disk=0% 
 7 cpu=cpu7 load=5% irq=5% disk=0% 
 8 cpu=cpu8 load=1% irq=1% disk=0% 
 9 cpu=cpu9 load=12% irq=12% disk=0% 
10 cpu=cpu10 load=17% irq=15% disk=0% 
11 cpu=cpu11 load=6% irq=6% disk=0% 
12 cpu=cpu12 load=0% irq=0% disk=0% 
13 cpu=cpu13 load=15% irq=15% disk=0% 
14 cpu=cpu14 load=11% irq=11% disk=0% 
15 cpu=cpu15 load=14% irq=14% disk=0% 
16 cpu=cpu16 load=16% irq=12% disk=0% 
17 cpu=cpu17 load=42% irq=21% disk=0% 
18 cpu=cpu18 load=7% irq=4% disk=0% 
19 cpu=cpu19 load=48% irq=11% disk=0% 
20 cpu=cpu20 load=17% irq=17% disk=0% 
21 cpu=cpu21 load=12% irq=12% disk=0% 
22 cpu=cpu22 load=11% irq=11% disk=0% 
23 cpu=cpu23 load=42% irq=31% disk=0% 
24 cpu=cpu24 load=12% irq=12% disk=0% 
25 cpu=cpu25 load=19% irq=18% disk=0% 
26 cpu=cpu26 load=11% irq=8% disk=0% 
27 cpu=cpu27 load=8% irq=8% disk=0% 
28 cpu=cpu28 load=0% irq=0% disk=0% 
29 cpu=cpu29 load=12% irq=12% disk=0% 
30 cpu=cpu30 load=15% irq=15% disk=0% 
31 cpu=cpu31 load=7% irq=7% disk=0% 
32 cpu=cpu32 load=5% irq=5% disk=0% 
33 cpu=cpu33 load=15% irq=15% disk=0% 
34 cpu=cpu34 load=33% irq=33% disk=0% 
35 cpu=cpu35 load=19% irq=19% disk=0% 
See how badly it distrubutes the load. Actually watching it live at peak hours where bandwidth get a bit higher, you can easily notice that some CPUs reach %100.
Nevertheless it does its job up to maybe 1500 users.
I hope v7 comes in with more multi threading support.
 
serjrd
just joined
Topic Author
Posts: 15
Joined: Mon Nov 30, 2015 3:24 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Tue Dec 22, 2015 1:44 pm

Ok, so several hours ago, during night time, we set our CCR1072 up as a pppoe server. And are now monitoring the load.

So far we diverted around 3.5k pppoe users on it. Here are the stats.

Active sessions:
/ppp active print count-only 
3548
Queues:
/queue simple print count-only 
1713
Filters:
/ip firewall filter print count-only 
3424
Current traffic:
rx-packets-per-second:       200 793
          rx-bits-per-second:     1618.0...
    fp-rx-packets-per-second:       200 793
       fp-rx-bits-per-second:     1618.0...
         rx-drops-per-second:             0
        rx-errors-per-second:             0
       tx-packets-per-second:       151 573
          tx-bits-per-second:     658.1Mbps
    fp-tx-packets-per-second:             0
       fp-tx-bits-per-second:          0bps
         tx-drops-per-second:             0
        tx-errors-per-second:             0
CPU load:
 0 cpu=cpu0 load=4% irq=4% disk=0% 
 1 cpu=cpu1 load=17% irq=17% disk=0% 
 2 cpu=cpu2 load=40% irq=40% disk=0% 
 3 cpu=cpu3 load=31% irq=31% disk=0% 
 4 cpu=cpu4 load=23% irq=23% disk=0% 
 5 cpu=cpu5 load=42% irq=42% disk=0% 
 6 cpu=cpu6 load=30% irq=16% disk=0% 
 7 cpu=cpu7 load=53% irq=53% disk=0% 
 8 cpu=cpu8 load=60% irq=60% disk=0% 
 9 cpu=cpu9 load=43% irq=38% disk=0% 
10 cpu=cpu10 load=41% irq=41% disk=0% 
11 cpu=cpu11 load=28% irq=28% disk=0% 
12 cpu=cpu12 load=55% irq=52% disk=0% 
13 cpu=cpu13 load=30% irq=29% disk=0% 
14 cpu=cpu14 load=49% irq=49% disk=0% 
15 cpu=cpu15 load=31% irq=31% disk=0% 
16 cpu=cpu16 load=16% irq=16% disk=0% 
17 cpu=cpu17 load=25% irq=24% disk=0% 
18 cpu=cpu18 load=42% irq=42% disk=0% 
19 cpu=cpu19 load=54% irq=54% disk=0% 
20 cpu=cpu20 load=48% irq=38% disk=0% 
21 cpu=cpu21 load=50% irq=48% disk=0% 
22 cpu=cpu22 load=8% irq=8% disk=0% 
23 cpu=cpu23 load=52% irq=36% disk=0% 
24 cpu=cpu24 load=28% irq=28% disk=0% 
25 cpu=cpu25 load=19% irq=19% disk=0% 
26 cpu=cpu26 load=26% irq=26% disk=0% 
27 cpu=cpu27 load=22% irq=21% disk=0% 
28 cpu=cpu28 load=21% irq=21% disk=0% 
29 cpu=cpu29 load=16% irq=16% disk=0% 
30 cpu=cpu30 load=23% irq=23% disk=0% 
31 cpu=cpu31 load=41% irq=41% disk=0% 
32 cpu=cpu32 load=46% irq=39% disk=0% 
33 cpu=cpu33 load=21% irq=21% disk=0% 
34 cpu=cpu34 load=39% irq=37% disk=0% 
35 cpu=cpu35 load=54% irq=53% disk=0% 
36 cpu=cpu36 load=18% irq=18% disk=0% 
37 cpu=cpu37 load=24% irq=24% disk=0% 
38 cpu=cpu38 load=50% irq=48% disk=0% 
39 cpu=cpu39 load=38% irq=38% disk=0% 
40 cpu=cpu40 load=14% irq=14% disk=0% 
41 cpu=cpu41 load=56% irq=48% disk=0% 
42 cpu=cpu42 load=53% irq=53% disk=0% 
43 cpu=cpu43 load=45% irq=44% disk=0% 
44 cpu=cpu44 load=14% irq=14% disk=0% 
45 cpu=cpu45 load=49% irq=47% disk=0% 
46 cpu=cpu46 load=22% irq=21% disk=0% 
47 cpu=cpu47 load=21% irq=21% disk=0% 
48 cpu=cpu48 load=30% irq=22% disk=0% 
49 cpu=cpu49 load=32% irq=32% disk=0% 
50 cpu=cpu50 load=36% irq=36% disk=0% 
51 cpu=cpu51 load=29% irq=15% disk=0% 
52 cpu=cpu52 load=56% irq=54% disk=0% 
53 cpu=cpu53 load=18% irq=18% disk=0% 
54 cpu=cpu54 load=14% irq=14% disk=0% 
55 cpu=cpu55 load=32% irq=32% disk=0% 
56 cpu=cpu56 load=34% irq=26% disk=0% 
57 cpu=cpu57 load=37% irq=37% disk=0% 
58 cpu=cpu58 load=23% irq=23% disk=0% 
59 cpu=cpu59 load=32% irq=32% disk=0% 
60 cpu=cpu60 load=17% irq=17% disk=0% 
61 cpu=cpu61 load=43% irq=43% disk=0% 
62 cpu=cpu62 load=39% irq=37% disk=0% 
63 cpu=cpu63 load=31% irq=31% disk=0% 
64 cpu=cpu64 load=19% irq=19% disk=0% 
65 cpu=cpu65 load=21% irq=21% disk=0% 
66 cpu=cpu66 load=19% irq=19% disk=0% 
67 cpu=cpu67 load=12% irq=12% disk=0% 
68 cpu=cpu68 load=37% irq=37% disk=0% 
69 cpu=cpu69 load=12% irq=12% disk=0% 
70 cpu=cpu70 load=15% irq=15% disk=0% 
71 cpu=cpu71 load=45% irq=26% disk=0% 
We're still waiting for the peak hours to see how it handles the load. But so far it does the job.
 
marlowbg
newbie
Posts: 33
Joined: Wed Oct 06, 2010 4:23 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Tue Dec 22, 2015 3:38 pm

Hi guys,

I'm also testing CCR1072 for PPPoE.

Using around 10 Ip firewall mangle rules to mark specific traffic and allow higher speed for it and of course Dynamic Simple queues per user for Internet speed limits.

So far with approx of 200 PPPoE sessions connected and around 150-160 Mbps of traffic, and I already started seeing some of the CPU cores reach 70-80%.

I have few questions

1. I'm right now using only 2 x 10Gig ports (1 is for WAN and 2nd is for LAN). Will it be more optimised if I use more 10Gig ports?

2. I'm doing NATing on the CCR1072 as well, is this the main CPU eater?

4. What else can be optimised?
 
serjrd
just joined
Topic Author
Posts: 15
Joined: Mon Nov 30, 2015 3:24 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Wed Dec 23, 2015 10:47 pm

All right, so here's a quick summary of our testing:

At about 3900 online pppoe sessions and ~1900 shaping queues we started seeing very high cpu load for all 72 cores (and at some point we saw them ALL busy in the range between 90% and 100%).

We also did speed tests: one under the account with speed limit and the other under the account with no shaping restrictions.
The unrestricted account worked well and could reach ~100Mbit/s. The user the had a speed limit at 60Mbit/s could reach only ~50Mbit/s. So we had to conclude that the service quality actually degraded.

Tonight we decreased the amount of sessions. And our CCR1072 is now serving about 3000 pppoe sessions with shaping rules for about 1500 users. At these numbers we see proper results at speed tests, so it pretty much looks like the actual limit this device is capable of in our configuration.
 
marlowbg
newbie
Posts: 33
Joined: Wed Oct 06, 2010 4:23 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Wed Dec 23, 2015 11:09 pm

Hi,

3000 users online is fine, but how much traffic they consume?

I'm right now running 200 users and 1 core reaches 80%, 3-4 Cores are at 10% and rest are 0 or 1.

Do you think that if 1 core reaches 100% is actually a problem and some service, for some customer degrades?
 
serjrd
just joined
Topic Author
Posts: 15
Joined: Mon Nov 30, 2015 3:24 pm

Re: PPPoE + Shaping for 10000+ clients on CCR-1072

Thu Dec 24, 2015 7:43 am

Hi,

3000 users online is fine, but how much traffic they consume?

I'm right now running 200 users and 1 core reaches 80%, 3-4 Cores are at 10% and rest are 0 or 1.

Do you think that if 1 core reaches 100% is actually a problem and some service, for some customer degrades?
Hi marlowbg,

At peak moment they were consuming 2.43Gb/s(in)+1.2Gb/s(out),and 356Kpps(in)+239Kpps(out).
At that time we saw ~50 cores having load in a range of 30-60% and ~22 cores having load within 60-90%.

As I said it looks like our current configuration can serve ~3000 session without service degradation.

In your case, you should actually do some tests to find out. But I'm pretty sure that the 100% load is a sign of problems.
I think that some of the tasks you perform on your CCR is very badly optimized core-wise. It may be NAT, but you need to do some testing to find out. We do not NAT our users at CCR. We only terminate PPPoE sessions and do traffic for shaping some of them.