Page 1 of 1

Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 14, 2021 10:38 am
by jprietove
In order to test OSPF in RouterOS v7.1 I've created this project in GNS3:
project.png
BUG: When routers get out of OSPF and after some time they come back, adjacency isn't recovered and DR election seems erratic.

The configuration is the same for every router, changing only IP addresses:
# R1 configuration
/interface bridge
add name=loopback
/routing id
add id=172.0.0.11 name=ospf
/routing ospf instance
add name=ospf-instance-1 router-id=ospf
/routing ospf area
add instance=ospf-instance-1 name=backbone
/ip address
add address=172.17.0.11/24 interface=ether1 network=172.17.0.0
add address=192.168.1.1/24 interface=ether2 network=192.168.1.0
add address=172.17.0.11 interface=loopback network=172.17.0.11
/routing ospf interface-template
add area=backbone networks=172.17.0.0/24
add area=backbone networks=192.168.1.0/24 passive
add area=backbone networks=172.17.0.11
/system identity
set name=R1
I start the routers in order so R1 gets elected as DR and R2 as BDR:
[admin@R1] > routing/ospf/neighbor/print 
Flags: V - virtual; D - dynamic 
 0  D instance=ospf-instance-1 area=backbone address=172.17.0.12 priority=128 
      router-id=172.17.0.12 dr=172.17.0.11 bdr=172.17.0.12 state="Full" 
      state-changes=6 adjacency=1m39s timeout=31s 

 1  D instance=ospf-instance-1 area=backbone address=172.17.0.13 priority=128 
      router-id=172.17.0.13 dr=172.17.0.11 bdr=172.17.0.12 state="Full" 
      state-changes=6 adjacency=49s timeout=34s 

 2  D instance=ospf-instance-1 area=backbone address=172.17.0.14 priority=128 
      router-id=172.17.0.14 dr=172.17.0.11 bdr=172.17.0.12 state="Full" 
      state-changes=6 adjacency=19s timeout=35s 
Now, I start ping from nuc-1 to nuc-2 and I disable the link between R4 and the switch. After several seconds, R1 won't show R4 as a neighbor and there is no packet loss between nuc-1 and nuc-2 as expected.

But, when the link between R4 and the switch is resumed, packets are lost from nuc-1 to nuc-2:
...
From 192.168.1.1 icmp_seq=108 Destination Net Unreachable
From 192.168.1.1 icmp_seq=110 Destination Net Unreachable
...
BUG: Looking at R1 neighbors I can see that R4 has been selected as the new DR (why? DR was previously elected) and adjacency get stucked between R1 and R4:
[admin@R1] > routing/ospf/neighbor/print 
Flags: V - virtual; D - dynamic 
 0  D instance=ospf-instance-1 area=backbone address=172.17.0.12 priority=128 
      router-id=172.17.0.12 dr=172.17.0.14 bdr=172.17.0.12 state="Full" 
      state-changes=6 adjacency=7m59s timeout=31s 

 1  D instance=ospf-instance-1 area=backbone address=172.17.0.13 priority=128 
      router-id=172.17.0.13 dr=172.17.0.14 bdr=172.17.0.12 state="TwoWay" 
      state-changes=7 timeout=34s 

 2  D instance=ospf-instance-1 area=backbone address=172.17.0.14 priority=128 
      router-id=172.17.0.14 dr=172.17.0.14 bdr=172.17.0.12 state="ExStart" 
      state-changes=3 timeout=31s 
And in R4:
[admin@R4] > routing/ospf/neighbor/print 
Flags: V - virtual; D - dynamic 
 0  D instance=ospf-instance-1 area=backbone address=172.17.0.11 priority=128 
      router-id=172.0.0.11 dr=172.17.0.14 bdr=172.17.0.12 state="TwoWay" 
      state-changes=2 timeout=35s 

 1  D instance=ospf-instance-1 area=backbone address=172.17.0.12 priority=128 
      router-id=172.17.0.12 dr=172.17.0.14 bdr=172.17.0.12 state="Full" 
      state-changes=6 adjacency=10m55s timeout=35s 

 2  D instance=ospf-instance-1 area=backbone address=172.17.0.13 priority=128 
      router-id=172.17.0.13 dr=172.17.0.14 bdr=172.17.0.12 state="Full" 
      state-changes=6 adjacency=10m52s timeout=39s 
In order to see if this bug is caused by a bug in DR election, I repeat the lab but this time, I start the routers in reversed order, so R4 get elected as DR and R3 as BDR. If I suspend the link between R1 and the switch and wait for timeouts... when the link is resumed R1 can't get the adjacency:
[admin@R1] > routing/ospf/neighbor/print 
Flags: V - virtual; D - dynamic 
 0  D instance=ospf-instance-1 area=backbone address=172.17.0.12 priority=128 
      router-id=172.17.0.12 dr=172.17.0.14 bdr=172.17.0.13 state="TwoWay" 
      state-changes=4 timeout=33s 

 1  D instance=ospf-instance-1 area=backbone address=172.17.0.13 priority=128 
      router-id=172.17.0.13 dr=172.17.0.14 bdr=172.17.0.13 state="ExStart" 
      state-changes=3 timeout=37s 

 2  D instance=ospf-instance-1 area=backbone address=172.17.0.14 priority=128 
      router-id=172.17.0.14 dr=172.17.0.14 bdr=172.17.0.13 state="ExStart" 
      state-changes=3 timeout=37s 
And from R4 point of view:
[admin@R4] > routing/ospf/neighbor/print 
Flags: V - virtual; D - dynamic 
 0  D instance=ospf-instance-1 area=backbone address=172.17.0.11 priority=128 
      router-id=172.0.0.11 dr=172.17.0.14 bdr=172.17.0.13 state="TwoWay" 
      state-changes=2 timeout=34s 

 1  D instance=ospf-instance-1 area=backbone address=172.17.0.12 priority=128 
      router-id=172.17.0.12 dr=172.17.0.14 bdr=172.17.0.13 state="Full" 
      state-changes=6 adjacency=5m36s timeout=31s 

 2  D instance=ospf-instance-1 area=backbone address=172.17.0.13 priority=128 
      router-id=172.17.0.13 dr=172.17.0.14 bdr=172.17.0.13 state="Full" 
      state-changes=6 adjacency=6m16s timeout=34s 
Hope this can help MikroTik developers to find and fix it.

PD: Please @anav feel free to improve the thread title

Re: Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 14, 2021 12:34 pm
by mrz
Please open support ticket.

Re: Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 14, 2021 12:43 pm
by jprietove
More about this. I created the same project with RouterOS v.6.49.1, started all routers from R1 to R4, so R1 gets DR and R2 BDR.

After suspending R4 link, wait until timeout and resume the link, R4 gets also elected as DR and ping from nuc-1 to nuc-2 loose 10 packets (10 seconds).

And the same project but using ffrouting, R4 is also elected as DR instead of keeping R1 as DR. But here, nuc-1 to nuc-2 only loose 4 packets.

So maybe the change of DR is not a bug (I can't understand why DR change) or, at least, ffrouting works that way also.

But adjacency is not established after DR change.

Re: Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 14, 2021 1:06 pm
by aleksis
BUG: Looking at R1 neighbors I can see that R4 has been selected as the new DR (why? DR was previously elected)
R4 will elect itself as the DR after it looses adjacency with other neighbours. After the link is repaired a new election will happen with 2 DR present and R4 wins because of 'higher' router-id.

Re: Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 14, 2021 1:50 pm
by jprietove
R4 will elect itself as the DR after it looses adjacency with other neighbours. After the link is repaired a new election will happen with 2 DR present and R4 wins because of 'higher' router-id.
Yes, thanks. It's not the same as 'cold-boot' I see now... I thought it would be the same as it happens from a cold start, as R4 has no neighbours.

But there is stille another bug: after DR changes the adjacency is not reached. I think this is the same as reported in topic "OSPF not working on RouterOS v7.1 between 2 routers"

Re: Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 14, 2021 2:03 pm
by mrz
Can you please enable ospf debug logs, trigger exstart state problem, generate supouts and send those files to support?

Re: Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 14, 2021 3:13 pm
by jprietove
Can you please enable ospf debug logs, trigger exstart state problem, generate supouts and send those files to support?
MikroTik support #[SUP-68898]

Re: Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 14, 2021 3:50 pm
by jprietove
Tested same scenario but this time with IPv6 and OSPF-v3: again, the router that was DR doesn't like to give the Designation to the new DR and neighboring status keeps in "ExStart"

Re: Testing OSPF in simple configuration: bugs detected

Posted: Tue Dec 21, 2021 7:54 pm
by jprietove
Happy to see that this bug has been solved in v.7.1.1

Now, when enabling R4 it get elected as DR and it's as fast as ffrouting changing R1 from DR to normal router.
And also, R1 status change from "ExStart" to "Full".

Thanks team! Good work!