BFD configuration

MichelePietravalle · Tue Feb 16, 2010 10:54 am

Hi, i need help to configure BFD; i have a ospf-based network and i'm looking to improve the convergence time during a link failure; until now, i have used ospf hello&dead time but, aftere the upgrade to 4.5, i'm testing bfd.

i have configured bfd under /routing bfd interface, and i have enabled BFD support on OSPF interface; now i can see the bfd neighbor up on the true menu;

but, if i disable a radio card between the two ospf router, i can immediatly see the state change of BFD, but OSPF still remain in full-way, until ospf router-dead-time; i have forgotten some configuration step?

thanks,

Michele

MichelePietravalle · Tue Feb 16, 2010 11:17 am

randomly, without link failure (radio or ethernet) i can see on the log the change of BFD from full to down with reason "packet read timeout"..
i hope for a configuration error!

Eising · Tue Feb 16, 2010 11:29 am

Two things to check:

How are the links actually performing, specifically, is there any packet loss? Packet loss could cause BFD packets to be missed, causing BFD to report the neighbour down. No packet loss may occur if BFD is to operate properly!

Secondly, you should check your CPU load. High load may cause BFD to be delayed and neighbours will be declared down.

MichelePietravalle · Tue Feb 16, 2010 11:35 am

already checked

cpu is 0-5% on both rotuer (one is a 493AH, other a 450G), about packet loss i have tried to put&get data over link with wireshark and i cannot see any loss.
radio signal is good, -69db with a ccq from 90 to 100%..

but, i see now, i have nstreme with "best fit" configured; can is this the problem? (but i have the same problem on a ethernet link..)

are you using BFD?

thanks,

regards,

Michele

Eising · Tue Feb 16, 2010 11:40 am

No, unfortunately, I've only done rudimentary testing on it on the MikroTik platform, so my advice was more based on my understanding of the protocol than my MikroTik experience.

Maybe you could paste your ospf and BFD configuration?

MichelePietravalle · Tue Feb 16, 2010 11:49 am

yes of course!
side1:"Cesen"

/routing ospf interface> :put [get 2 use-bfd ] 
true

/routing bfd interface> pr
Flags: X - disabled, I - inactive 
 #   INTERFACE            INTERVAL  MIN-RX    MULTIPLIER
 0   all                         0.2sec    0.2sec    5         
 1   2 - Mogliano           0.5sec    0.5sec    6         

/routing bfd neighbor> pr
Flags: U - up 
 #   STATE  ADDRESS          INTERFACE      PROTOCOLS  MULTIHOP
 0 U up     94.198.72.21     2 - Mogliano      ospf       no      

10:39:02 route,ospf,info OSPFv2 neighbor 94.198.72.42: state change from Full to Down 
10:39:05 bfd,debug BFD neighbor 94.198.72.21 on 2 - Mogliano changed state to DOWN 
10:39:05 bfd,debug     reason: packet read timeout

side 2: "Mogliano"

/routing ospf interface> :put [get 0 use-bfd ]
true

/routing bfd interface> pr
Flags: X - disabled, I - inactive 
 #   INTERFACE           INTERVAL  MIN-RX    MULTIPLIER
 0   all                       0.2sec    0.2sec    5         
 1   3 - Cesen             0.5sec    0.5sec    6

/routing bfd neighbor> pr
Flags: U - up 
 #   STATE  ADDRESS          INTERFACE       PROTOCOLS  MULTIHOP
 0 U up     94.198.72.22     3 - Cesen           ospf       no      

15:23:45 route,ospf,info OSPFv2 neighbor 94.230.75.49: state change from Full to
 Down 
15:23:54 bfd,debug removed BFD neighbor 94.198.72.22 on 3 - Cesen 
15:23:54 bfd,debug BFD neighbor 94.198.72.22 on 3 - Cesen changed state to DOWN 
15:23:54 bfd,debug     reason: administrative change

Eising · Tue Feb 16, 2010 12:46 pm

Is your time synchronized? It's hard to tell anything from those log entries, since the time stamps don't match.

MichelePietravalle · Tue Feb 16, 2010 12:55 pm

no, time are very different (1970 on one router!

)

but the log that i have posted are sequential!

now i have configured ntp on the second router (writing error on ntp host

)

atis · Tue Feb 16, 2010 2:05 pm

The logs you have posted so far do not show any problems with the operation of the protocol.

On one end of the link a OSPF neighbor is lost (it timeouts?) and the BFD neighbor removed because of that ("reason: administrative change").

On the other end of the link, the corresponding OSPF and BFD neighbors are still standing, until first one of them timeouts. That happens to be the BFD neighbor ("reason: packet read timeout") - because the other end is not sending BFD messages anymore.

Further investigation is required to see why the OSPF neighbors go down.

MichelePietravalle · Tue Feb 16, 2010 2:35 pm

without BFD i have ~1 week of ospf uptime...

xxiii · Thu Feb 18, 2010 8:07 am

We experimented with this between x86 based routers and had OSPF and/or BGP sessions bouncing up and down constantly. turn off BFD, and everything stabalizes. This was over short fast ethernet links with 0 packet loss.

So, we've concluded its just not ready yet. Alternatlivey, perhaps 0.2 seconds is just too fast for the current implementation, or the BFD process doesn't have enough priority. We've noticed cpu hits 100% during BGP table exchange (at least when you have filters), and we were exchanging tables with approximately 90,000 entries. I didn't experiment with differrent timer values.

With the default settings, if I understand correctly, this means either the sending router, or receiving router (or some combination) had to fail to send/detect bfd packets for a full second. (and if BGP peer setup is hogging one of or both the CPU and the link capacity, perhaps BFD just couldn't get packets in edge-wise, so to speak).

MichelePietravalle · Thu Feb 18, 2010 8:11 am

Good!!! i'm not the only one with problem!!

yes, i have the some one bouncing problem; on laboratory all works fine, on real network (then with some cpue load, bandwith usage, ping latency etc etc) is very unstable; i have also try to change default values but it bounces in every setup!

MichelePietravalle · Thu Feb 18, 2010 8:11 am

do you have write and sent supout of this problem to mikrotik?

thanks!

Michele

xxiii · Tue Feb 23, 2010 1:58 am

I've had a 450g and a 433ah on my desk running a bgp, and a bfd all weekend, and they've been fine:

0 U state=up address=192.168.101.50 interface=ether1 protocols=bgp multihop=no
state-changes=2 uptime=3d5h16m58s desired-tx-interval=0.2sec
actual-tx-interval=0.2sec required-min-rx=0.2sec remote-min-rx=0.2sec
multiplier=5 hold-time=1sec packets-rx=1671271 packets-tx=1671271

However, they don't have any other load on them, and the bgp session has a whopping 1 or 2 routes max, and they aren't handling any other traffic. Now I need to figure out how to make them busy.

Eising · Tue Feb 23, 2010 8:36 am

I've really considered writing some sort of scripted solution that disables BFD once the cpu load enters the 90% area.
If people's experience is that BFD is unstable under real production load, I'm not going to turn it on yet.

MichelePietravalle · Tue Feb 23, 2010 9:48 am

mikrotik has replyed to me with the solution of increase the multiplier if the link is heavy loaded... now i will try but, if i increase the multiplier, the convergence time will increase...

Eising · Tue Feb 23, 2010 1:44 pm

Yeah, but it is still significantly better than without BFD...

MichelePietravalle · Tue Feb 23, 2010 1:46 pm

yes, of course; with OSPF you can reach almost 4s of convergence without any problem, with BFD i hope for 2, this night i will test them;

Michele

xxiii · Wed Feb 24, 2010 12:31 am

I added a tcp bandwidth test to my little test setup and left it running all night, which claims to have the CPU on both sides at 100%.

On the receiving side we have:

Flags: U - up
0 U state=up address=192.168.101.51 interface=ether1 protocols=bgp multihop=no
state-changes=1 uptime=4d3h26m50s desired-tx-interval=0.2sec
actual-tx-interval=0.2sec required-min-rx=0.2sec remote-min-rx=0.2sec
multiplier=5 hold-time=1sec packets-rx=2127635 packets-tx=2129848

and on the sending side we have:

Flags: U - up
0 U state=up address=192.168.101.50 interface=ether1 protocols=bgp multihop=no
state-changes=2 uptime=4d3h26m51s desired-tx-interval=0.2sec
actual-tx-interval=0.2sec required-min-rx=0.2sec remote-min-rx=0.2sec
multiplier=5 hold-time=1sec packets-rx=2129856 packets-tx=2127645

I take this to indicate that the side initiating the bandwidth test seems to be sending BFD packets at a slightly larger interval than the other side.

Interestingly, based on the uptime and the 0.2 sec interval, there should only have been 1792060 bfd packets sent by now, instead of the 2129961 actually sent, hmm.... resulting in an actual interval of 168ms instead of 200ms.

Well, anyway, even with this test, the bfd/bgp session didn't drop, and the bandwidth test is averaging 76.4 mbits (over 100 mbit connection). Unfortunately, I don't have two idle x86 based routers to try this on, and perhaps the bandwidth tester isn't sufficent enough load to cause the problem to show up.

x86 ones with huge bgp tables are where I saw the problem before.

If you do a script which disables it, it would have to do it on both sides. Upping the interval and/or multiplier is probably easier.

atis · Wed Feb 24, 2010 10:20 am

MichelePietravalle: also make sure you don't drop or de-prioritize BFD packets in your firewall. You may want to write special rules to handle these packets separately. I think ToS class "Internetcontrol" (two highest ToS bits set) is assigned to them, as well as to other routing protocol packets - this can be used to distinguish them from other traffic.

resulting in an actual interval of 168ms instead of 200ms.

xxiii: this is because a small random jitter is subtracted from protocol timers, to avoid accidental unneeded synchronisation.

xxiii · Tue Mar 02, 2010 8:03 pm

xxiii: this is because a small random jitter is subtracted from protocol timers, to avoid accidental unneeded synchronisation.

For future reference, more info (anyone who sees this thread and is curious):

http://tools.ietf.org/id/draft-ietf-bfd-base-11.txt

sections 6.8.7 and 6.8.9

I'm cautiously reapplying BFD in a live network (albeit with larger intervals/multipliers, for now). I'm fairly sure my problems before were due to another issue that caused a very high cpu load on the router(s) in question, and bfd knocking it up and down probably exasperated it further. This still worries me a bit.

iDen · Fri Feb 17, 2012 2:40 pm

I have same problem.
Setup is rb411ah and rb493ah with BFD configured only on Ether1 ports... fastethernet connection between routers with 0% packet loss and 1ms timeout between routers:

BFD configuration on both routers look same:

add disabled=no interface=ether1 interval=1s min-rx=10s multiplier=5

Flags: U - up 
 #   INTERFACE                        ADDRESS          PROTOCOLS  MULTIHOP STATE 
 0 U ether1                           10.0.2.2         ospf       no       up

Flags: U - up 
 #   INTERFACE                        ADDRESS          PROTOCOLS  MULTIHOP STATE 
 0 U ether1                           10.0.2.1         ospf       no       up

BFD configuration

BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Re: BFD configuration

Who is online