Thought i'd throw this out there to get a case study of OSPF backbone networks that have continually grown and not segregated into multiple area's or multiple instances
I'd primarily like to hear from people who have routers that include wireless links, slow/unreliable links, long daisy chained segments etc and hear about their convergence time and how the network handles link flapping, failover and reconvergence
We currently have multiple Area0 networks (with additional areas but they don't matter in this case) that are geographically separated but have grown to overlap each other and use each other for transit and uplink/core failover. This is accomplished by using multiple Area0 instances and redistributing between them with route tags to differentiate which region it came from and prevent loops when redistributing back the other way. This keeps those Area0 networks logically separated, allowing summarization and and stopping LSA's from some far edge corner of the network from going 20 hops to some other router that doesn't give a flying hoot.
They're kept separate primarily to reduce convergence time, especially with flapping links. However i'm wondering how necessary this is? I don't like it because its a bit messy, it's not by-the-book design at all and its very complicated for outside eyes to look at the network and make sense of it
I'm looking at making a major redesign of the network to get rid of this situation and have a single Area0 with as many other routers as possible off into their own areas to keep convergence times as low as possible with no impact when an edge router has a flapping link. The process of doing this would be greatly eased if I just threw everything into Area0 during the redesign, but i'm wondering about the real world impact on the network if I did that. And maybe its all overthought and I should just throw everything into Area0 and forget about it
There are publications that suggest no more than X number of routers, but often these are for campus or enterprise style networks with copper and fibre everywhere. Wireless is a whole different ball game and most of our backbone is built on wireless. We have a number of 24ghz and 60ghz links running OSPF+BFD that go down during rain with 5ghz failover. These are routed links so every time this happens LSA's are generated. If the routers don't re-converge quickly because they are waiting for all routers to agree on the network topology then it could result in connections being dropped before the failover kicks in. Reconverge needs to happen within a couple seconds at most, ideally much less than that
What's your experience with larger Area0 networks? If you have i.e. 20-200 routers that are spread out all the place, do you know how long it takes to reconverge? Do you end up with unpopulated routes and broken connectivity when there's a flapping network somewhere on the edge thats causing all the routers to wait because the OSPF tables remain incomplete?