Page 1 of 1
BGP on CCR1036-8G-2S+EM
Posted: Wed Oct 12, 2016 10:46 pm
by vxx
Hi folks,
I'm new here and I am looking for advices on CCR1036-8G-2S+EM device.
I currently run a bunch of Juniper MX240/480, Brocade MLXe and a few older Cisco routers in a BGP DFZ configuration; currently we pick 5 full tables from Level3, Telia, Cogent and HE. Some of my boxes are getting old and the traffic we move isn't generating big revenues, although we push a lot of bandwidth. I am not keen to pay the previous mentioned vendors big money to move packets from/to our Tier-1 down to our customers. The network is just a side of our business and is mostly to support our customers rather than to earn big money.
Anyway, a current DFZ full-feed session is now around 610k prefixes with our >= /24 filtering and I would like to know how long it takes to bring a CCR1036-8G-2S+EM a full-feed BGP session up? Each router push between 15 and 20 Gbps during prime time and about 8 Gbps average. Each unit has between 2 and 4 full tables. Can we handle 4 full-sessions with a CCR1036-8G-2S+EM unit?
Thank you for your insights.
Alex
Re: BGP on CCR1036-8G-2S+EM
Posted: Wed Oct 12, 2016 11:12 pm
by ZeroByte
I think one of the big issues with DFZ BGP and the CCR platform is that the BGP process is not multi-threaded, so it tends to grab one core and peg that core at 100% utilization forever.
Meaning that BGP itself is slow, while the forwarding of packets is still fine and dandy (there are still 35 more cores available for that, right?).
Also, depending on what sort of bgp-fu you're using, there are a few 'quirks' with ROS's BGP you may wish to be aware of:
- iBGP for IPv6 is currently ~broken because recursive next hop resolution fails for link-local next hop.
- iBGP does not pass along a default prefix learned from EBGP - you must actually originate it again to your iBGP peers, and the if-installed behavior ALSO has some quirks, so I'd recommend against this.
- Community list filters treat the Internet community (0:0) as an always-match
- Community filters don't give you the ability to selectively strip communities - you can wipe a list and re-populate it with your own stuff, but you can't snipe out one or two specific communities.
- ROS's BGP tends to be hamfisted in its "clean room" approach - meaning that it tends to fully bounce peering sessions in situations you might not expect it to do so.
None of this is to say that Mikrotik routers can't be used in a big-iron capacity - there are just some things to know. A user here (IPANetEngineer) successfully deploys them in fortune500 enterprise networks which also involve a lot of MPLS as well, so it _can_ be done.
Another thing that bugs me about ROS's behavior with BGP is that it uses the IP routes table as its scratch space - so instead of having a dedicated BGP table (show bgp ipv4 unicast), you have all BGP paths in the routing table, and the less-than-best paths simply get flagged as inactive. Also, the ability to filter routes down to specific prefixes, etc. exist in ROS, but in ROS verion 6.x they run very slow on a large routing table. Mikrotik teased the community in a thread somewhere that the new version 7 of ROS has fixed this problem and queries on the routing table now run fast. But there's no indication of when version 7 will reach beta stage.
I also found a glitch where BGP can fail to withdraw a route properly. It removes the prefix from its "advertised prefixes" list, but fails to actually send the BGP update message to the peer - resulting in "ghost routes" - my exact scenario is pretty fringe and almost certainly won't come up in production, but others have mentioned that they found ghost announcements as well. (Mikrotik said this will be fixed in ver7 as well)
Re: BGP on CCR1036-8G-2S+EM
Posted: Tue Oct 18, 2016 7:56 am
by Murmaider
ZeroByte,
Judging by your post, it seems that Mikrotik has some real BGP issues. Do you perhaps know why are people then choose to use them as Big Iron or BGP Border routers in a DC type environment?
Surely the instability does not justify the cost saving?
Re: BGP on CCR1036-8G-2S+EM
Posted: Tue Oct 18, 2016 6:07 pm
by ZeroByte
Like I said before - none of these is a real show stopper per-se, as they only affect certain designs. They're just behaviors that it helps to know about. Although, I would say that the preponderance of "quirks" itself can be a show stopper.
I currently work at a "nobody gets fired for using Cisco" shop - so knowing these things about the current state of routing with ROS definitely keeps me from proposing a CCR as a border router today. There are execs here whose philosophy is that they'll stop yelling at you as soon as the network is fixed - and they start yelling pretty much the moment a packet gets dropped.
If I had a large enough network with enough border routers involved such that if one of them decided to do something weird, it wouldn't be a major network-wide event... I'd be more than happy to slip one into a border that wasn't as mission critical and go from there.
Re: BGP on CCR1036-8G-2S+EM
Posted: Tue Oct 18, 2016 7:25 pm
by Murmaider
That's interesting, we currently have 2x VyOS machines each handling 4x BGP peers (2 transit, 2 INX's on each machine) and OSPF for IGP.
However we are looking at the CCR-1072 for replacing these, but feedback on the CCR's regarding stability and reliability as a whole seem so mixed with same saying they amazing and others saying avoid it at all costs.
(sorry for the thread hijack)
Re: BGP on CCR1036-8G-2S+EM
Posted: Tue Oct 18, 2016 7:37 pm
by patrick7
I fully agree with @ZeroByte. But I need to say that my experiences with VyOS are horrible (reboot - BGP config gone, OSPF routes are propagated to other routers but not shown in the propagating router's table, SNMP crashing and reboot is needed, ....). I would for sure prefer RouterOS over VyOS.
Re: BGP on CCR1036-8G-2S+EM
Posted: Tue Oct 18, 2016 8:03 pm
by Murmaider
I fully agree with @ZeroByte. But I need to say that my experiences with VyOS are horrible (reboot - BGP config gone, OSPF routes are propagated to other routers but not shown in the propagating router's table, SNMP crashing and reboot is needed, ....). I would for sure prefer RouterOS over VyOS.
VyOS has been stable for us, except lately, small DDoS attacks of like 500k pps cause the intel interfaces (i350 and X520) to lock up until the pps drop down (CPU sits at 1% while this is happening) - but this isn't the right forum to troubleshoot this issue
We have had SNMP breaking with 10G interfaces as well and general SNMP issues.
My main issue is the slow development on it.
The seemly lack of direction and/or roadmap.
If you install the latest (1.1.7, you have to hunt down bundled fixes and manually install those .deb packages, like the SNMP 10G interface issue).
OSPF on ipv6 is just plain broken.
They don't seem to have any intention on upgrading quagga to a newer version.
The list goes on and on. However I will admit that BGP has been stable for us.
Our requirements are pretty simple, BGP / OSPF and i assume fastpath?
No queues or anything fancy as we do this on other devices within the network on their respective segments.
Re: BGP on CCR1036-8G-2S+EM
Posted: Tue Oct 18, 2016 10:08 pm
by ZeroByte
I think your typical border router with a few external peers, sending the best to your RR and possibly injecting OSPF default based on the presence of at least one 0.0.0.0/0 prefix being present from eBGP and a set of in/out filters is very doable and not likely to be affected by the quirks I mention. The one thing that won't happen with this model is that the system's active default GW prefix will never be one learned from OSPF unless all other sources (EBGP, RIP, Floating Static) do not have an active default GW - EVEN IF THE AD IS HIGHER THAN 110. (ROS's OSPF stack ignores default prefixes within OSPF if it is originating default - even using "if-installed" mode)
As a border router, this doesn't really cause any harm because you won't (shouldn't need) a floating static default GW route. If the only source is EBGP and all EBGP sessions drop / lose their default prefix, then OSPF will stop announcing, and then pick up 0.0.0.0/0 from the OSPF database. A new default prefix with distance > 110 will NOT pre-empt the OSPF path.
Like I stated earlier, I think the main possible gotcha for a border router role would come from your community policy - if your policy is to remove some unwanted communities while leaving others intact, then ROS can't do what you need. If your policy is to DROP prefixes with unsupported communities, or to wipe the community list if it contains unwanted communities, then you're a-okay.
Lastly, I noticed that my comment about slow filters was a bit ambiguous. What I meant was that viewing the routing table using filters is slow when you have a large routing table.
/ip route print where dst in 10.0.0.0/8 would take a while to return results, for instance.