Community discussions

MikroTik App
 
ISPIE001
just joined
Topic Author
Posts: 5
Joined: Thu Jan 18, 2018 12:00 pm

CCR2216 - Issues

Mon Dec 09, 2024 1:55 pm

From your own product page "If you are a rather large ISP, dealing with dynamic routing, massive BGPs, complex firewall rules, and intricate quality of service configurations..This is the right device for you"

I'm sorry Mikrotik, we are not seeing this and are currently in the process of rolling back to the CCR1072s

The layer 3 hardware offloading seems to struggle with full tables
We are seeing firewall rules not work
Winbox 3 issues with logging in to the routers - gets solved temporarily on reboot

From what I can see this router DOES NOT work well as an edge router with multiple full tables and I would be eager to see what experiences others have had with the 2216 before we put 8 of them on Ebay
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 13060
Joined: Thu Mar 03, 2016 10:23 pm

Re: CCR2216 - Issues

Mon Dec 09, 2024 2:28 pm

The text from product page you quoted was there before L3HW became available on CCR2216.

It seems that L3HW got broken in latest stable ROS (7.16.2) ... it's likely it runs out of routes memory. And when using device "with several full tables", this likely gets triggered much faster than when used as simple router in the corner of a fairly small LAN. So if you disable L3HW offload, then your routers should be stable, but indeed you'll loose some routing performance (and CCRS1072s may indeed be faster).
 
ISPIE001
just joined
Topic Author
Posts: 5
Joined: Thu Jan 18, 2018 12:00 pm

Re: CCR2216 - Issues

Mon Dec 09, 2024 4:21 pm

Disabling it hits 100% CPU very fast - kernel panic reboots etc etc !!
 
lele
newbie
Posts: 34
Joined: Thu Apr 02, 2015 1:20 am

Re: CCR2216 - Issues

Wed Dec 11, 2024 6:48 pm

We are in the process of deploying 4 CCR2216s as "backbone" routers, replacing the aging 1072s.
They are basically used as L3 switches, no firewall (just basic input firewall), no nat, no conntrack, single table. Just several fast ports with traffic going trough, connecting the border with the access layer.

They do OSPF and iBGP with a pair of reflectors, but they aren't seeing the full view. just ~3000 internal routes v4, and ~2000 v6. All four are new deploys, with 7.17rc2.
While most of the routing is fairly stable. We have a number (~300) of /32s for customers that are moving public addresses between POPs. These /32 may change, disappear, reappear a few times per day, per customers.

We enabled "full" L3HW. After ~24 hours, a dozen or so of these customers started calling and complaining they were not getting traffic. On inspection we were seeing something like in main table:

DAoH 0.0.0.0 -> sfpA
DAoH 10.0.0.0/24 -> sfpA
DAoH 10.0.0.42/32 -> sfpB

For all these customers, despite the /32 being active in the routing table, it was ignored and traffic sent to sfpA. Disabling L3HW immediately cleared the issue. Re-enabling was also fine until it re-presented itself a few hours later (same customers, too).
 
User avatar
sirbryan
Member
Member
Posts: 412
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: CCR2216 - Issues

Wed Dec 11, 2024 7:30 pm

We enabled "full" L3HW. After ~24 hours, a dozen or so of these customers started calling and complaining they were not getting traffic. On inspection we were seeing something like in main table:

DAoH 0.0.0.0 -> sfpA
DAoH 10.0.0.0/24 -> sfpA
DAoH 10.0.0.42/32 -> sfpB

For all these customers, despite the /32 being active in the routing table, it was ignored and traffic sent to sfpA. Disabling L3HW immediately cleared the issue. Re-enabling was also fine until it re-presented itself a few hours later (same customers, too).
I see this even with L3HW disabled on my 2116. After a few days or weeks, one of my border routers (with the most peers) will stop routing traffic properly, causing some prefixes to get black-holed within the network, and others to continue going out interfaces to which the BGP peers have been disabled.

On CRS300's with L3HW offload enabled, I see oddities particularly with routes that have two or more equal possible paths, even if they aren't both loaded (i.e. ECMP). I would have to disable and enable L3HW offload every few days on those router as well to keep it from locking up on specific routes.

L3HW offload + ECMP has been a problem for some time. The other BGP problem I mentioned only started with recent versions (7.15 + 7.16 for sure, not sure about 7.14).
 
User avatar
Paternot
Forum Guru
Forum Guru
Posts: 1059
Joined: Thu Jun 02, 2016 4:01 am
Location: Niterói / Brazil

Re: CCR2216 - Issues

Wed Dec 11, 2024 8:23 pm

L3HW has its limits, and I don't think it can take a full table face on.
These devices, if I understood them correctly, are more "core devices" than "edge devices". L3HW will (would? I'm not using it, and looks like some problem cropped up) do wonders with "few" routes.

Take a look at
https://help.mikrotik.com/docs/spaces/R ... Offloading (at the very end of the page)

CCR2216 can do hardware offload to 120k IPv4 route4s and about 20k IPv6 routes.
Not even close to a full BGP - but more than enough for a device more geared towards core routing. Really, 20K IPv6 routes is more than enough to (almost) anything that isn't border.
 
lele
newbie
Posts: 34
Joined: Thu Apr 02, 2015 1:20 am

Re: CCR2216 - Issues

Wed Dec 11, 2024 8:53 pm

CCR2216 can do hardware offload to 120k IPv4 route4s and about 20k IPv6 routes.
Shouldn't it fall back to CPU routing? You don't get to do large amount of traffic with more than 120k destinations at once, so routing the least use with the CPU should be okay-ish.

Anyways, if I understand what sirbryan is saying, he gets L3HW to work with the peers (we cannot assume it's a full view, doesn't say), it sometimes "forgets" or "sticks" some prefixes, a state which he has to disable L3HW to clear, which is similar to what I am seeing (and I only have 3000 prefixes). He also says it used to work, and he has issues even with L3HW disabled.

Doesn't seem a performance issue to me, more like an issue in the way the L3HW table is kept synced? Or even the protocols are imported in the FIB?
 
User avatar
Paternot
Forum Guru
Forum Guru
Posts: 1059
Joined: Thu Jun 02, 2016 4:01 am
Location: Niterói / Brazil

Re: CCR2216 - Issues

Wed Dec 11, 2024 8:56 pm

CCR2216 can do hardware offload to 120k IPv4 route4s and about 20k IPv6 routes.
Shouldn't it fall back to CPU routing? You don't get to do large amount of traffic with more than 120k destinations at once, so routing the least use with the CPU should be okay-ish.
It does fall back. But, then, it's much slower than hardware. If You have 500k routes, but usually route to just 15k of them... then, all is good.
Problem starts when You route to 15k of them - but never the SAME 15k... It's all a balancing act.
 
lele
newbie
Posts: 34
Joined: Thu Apr 02, 2015 1:20 am

Re: CCR2216 - Issues

Thu Dec 12, 2024 12:27 am

We considered the 2216 as border routers for several months, and always pushed the decision back due to the issues we were seeing with BGP and v7 in other, smaller, set ups.
In the end, we ditched them as borders in favour of refurb MX204s. But we hoped to use them as backbone/core. This is proving a challenge, too, although we're still on board.
 
User avatar
jbl42
Member Candidate
Member Candidate
Posts: 225
Joined: Sun Jun 21, 2020 12:58 pm

Re: CCR2216 - Issues

Thu Dec 12, 2024 4:23 pm

There is also still no HW support for VXLAN on 2216, although the switch ASIC would support it.
L3HW has issues, same for BGP. ROS on CCR2216 is not ready for prime time.

But it got DLNA support while routing is still broken. Why not use it as a DLNA server ;-) ?
 
User avatar
sirbryan
Member
Member
Posts: 412
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: CCR2216 - Issues

Thu Dec 12, 2024 7:33 pm

More specifically:

I am running L3HW offload on several CRS300's, using them as site or edge (customer-facing) routers. They work great, unless they have diverse routes with equal cost. In that case, they will eventually get confused and routes will get "stuck" going out the wrong port, despite changes in the routing table. This requires disabling/enabling L3HW offload to force the ASIC to reload the current FIB. Ensuring that no two paths are equidistant keeps this from happening. These internal-only routers only see customer prefixes and internal transit prefixes, so the entire FIB fits in the ASIC just fine.

On my 2116's (same CPU, RAM as 2216's, just a different switch chip), I am receiving full routes from multiple peers. I have three world-facing routers and two internal routers on the public side of my CGNAT stack, all fully meshed. The border routers all receive full routes from their peers, but then I filter to one or two AS's away. /ip/route/print shows a couple million routes, but most are filtered and don't make it into the FIB.

Early on, when I had just two border 2116's and three upstream peers, I ran a number of experiments with L3HW offload enabled and disabled. I found that for L3HW to have any effect, I needed to limit the number of prefixes I inserted, so I settled on two AS's away. That worked great for a while and the CPU would drop to < 5%, but inevitably routes would get stuck and I'd have to do the same disable/enable trick. After adding three more peers, the hardware tables got full pretty quickly, and the CPU remained just as busy with it on as it is with it off, so I have had L3HW offload disabled for some time now on my 2116's. We're at 10-20% with 3-4Gbps at peak. I still limit the inserted prefixes to be 2 AS's away and allow the preferred default route to take the bulk of the outbound traffic.

With five fully-meshed routers, all accepted (i.e. not filtered out) routes are shared with the other four. The internal ones therefore get a much smaller subset of routes. L3HW can insert all those routes just fine, but the CPU load is already pretty low on these routers (only pushing 3-4Gbps), and the risk of them "sticking" isn't worth the imperceptible improvements.

With 7.16 I'm seeing BGP routes get stuck on the busiest of the five routers. L3HW offload isn't even a factor here because it's disabled. I don't recall having had this problem on previous releases, at least to this extent (I've had to restart that router four times now), so I backed this one off to 7.15. (I remember it being pretty solid on either 7.14 and/or 7.15.)
 
lele
newbie
Posts: 34
Joined: Thu Apr 02, 2015 1:20 am

Re: CCR2216 - Issues

Thu Dec 12, 2024 8:37 pm

Thanks, sirbryan, that’s some interesting information.

We have been facing issues during tests with the specific brand of DWDM sfps we are using, that went away with 7.17. And that’s why we’re introducing a system with rc software in production. So going to older versions could be an issue, here.

The equal cost path thing is something I will need to investigate, it might be related to our event as the new network is fairly symmetric. Being stuck to software based forwarding would be pretty disappointing: we only moved a fraction of the traffic we expected, and we’re already seeing a load in the 20s, compared to barely moving from zero in L3HW.

Who is online

Users browsing this forum: ElmerHomero, krissg, marktaylorza, Samuel and 92 guests