Page 1 of 1
OSPF and BGP Issues
Posted: Fri Jun 16, 2017 12:58 pm
by techmngr
Hi Sir/s,
I just would like to check if anyone has encountered the issue we're having now on our 1072s.
For the past 12hrs, an unusual behavior occurred already thrice- Internet services would all of a sudden stop working.
If we check the logs, we would just see OSPF and BGP going down.
We'd resolve it by rebooting one of the 1072s - both are connected to each other.
Could there be something triggering the behavior? No changes done whatsoever before the issue occurs.
Configurations have been in place for years now and I'm thinking if this is a configuration issue then why
would a reboot solve the issue and not a configuration change?
Version is at 6.33.2 - CPU and Memory utilization is low so I guess that is not an issue.
Any suggestions what else I can check on?
Thanks in advance.
Re: OSPF and BGP Issues
Posted: Fri Jun 16, 2017 1:09 pm
by airbanduk
Have you tried a later firmware release? When was the last update and configuration change made?
I've been using 6.35 on the 1072 and they've been really stable. The only time I've seen OSPF play up without a config change is on wireless links if the signal degrades, seems the remote router needs a reboot to reconnect for some reason.
Re: OSPF and BGP Issues
Posted: Fri Jun 16, 2017 1:24 pm
by techmngr
Here's an example of what I see on the logs..unfortunately, it just says OSPF and BGP went down..
"12:15:25 route,ospf,info OSPFv2 neighbor 10.0.0.1: state change from Full to Init
12:16:00 system,info,account user noc logged in from 103.25.176.2 via winbox
12:16:13 interface,info <customer> link down
12:17:16 route,bgp,error HoldTimer expired
12:17:16 route,bgp,error RemoteAddress=45.64.80.146
12:17:37 route,bgp,error Received notification
12:17:37 route,bgp,error Hold timer expired, subcode=0
12:18:16 route,bgp,info Failed to open TCP connection: No route to host
12:18:16 route,bgp,info RemoteAddress=45.64.80.146
12:18:17 route,bgp,error Received notification
12:18:17 route,bgp,error Hold timer expired, subcode=0
12:18:24 route,ospf,info OSPFv2 neighbor 10.0.0.1: state change from ExStart to 2-Way
12:18:28 route,bgp,info Failed to open TCP connection: Network is unreachable
12:18:28 route,bgp,info RemoteAddress=10.0.0.1
12:18:32 route,ospf,info OSPFv2 neighbor 10.0.0.1: state change from ExStart to Init
12:18:36 route,bgp,info Failed to open TCP connection: No route to host
12:18:36 route,bgp,info RemoteAddress=45.64.80.146
12:18:56 route,bgp,info Failed to open TCP connection: No route to host
12:18:56 route,bgp,info RemoteAddress=45.64.80.146
12:19:12 route,bgp,info Connection opened by remote host
12:19:12 route,bgp,info RemoteAddress=10.0.0.1
12:19:16 route,bgp,info Failed to open TCP connection: No route to host
12:19:16 route,bgp,info RemoteAddress=45.64.80.146
Re: OSPF and BGP Issues
Posted: Fri Jun 16, 2017 1:29 pm
by techmngr
Have you tried a later firmware release? When was the last update and configuration change made?
I've been using 6.35 on the 1072 and they've been really stable. The only time I've seen OSPF play up without a config change is on wireless links if the signal degrades, seems the remote router needs a reboot to reconnect for some reason.
sir,
just 6hrs ago I upgraded to latest bug fix version 6.37.5 and so far, for the past 6hrs or so the issue hasn't re-surfaced. as for configuration change - none. No changes done hours and days before the incident occurred. no wireless configurations as well on the 1072s, we're currently using it as edge router since we're an ISP company. is it safe to assume to this is not a configuration issue?
thanks for your response.
Re: OSPF and BGP Issues
Posted: Fri Jun 16, 2017 1:49 pm
by airbanduk
Those errors are exactly what I see on CCR1009/1016 in the access network when the wireless links cause the neighbours to drop. On one side the neighbour comes up in 'Full' state, but the other cycles through the OSPF FSM in the way you've shown. I have to reboot the one that thinks it's Full to bring the neighbours back up correctly. As you don't have wireless links I can't say why it might be happening, but the symptoms seem identical.
Again, no config changes on our CCRs before this happens. If the wireless signals are tuned to a strong level, the problem disappears. Suggests to me the cause is a bad link, but the CCR must have a bug somewhere that stops OSPF from forming correctly again. I've tried using different OSPF link types - broadcast, nbma, ptp, ptmp - non of them have solved the issue. I've reverted to a script to automatically reboot the router that thinks it's 'Full', but in the core/edge I don't see how you could do this.
Re: OSPF and BGP Issues
Posted: Fri Jun 16, 2017 2:06 pm
by techmngr
sir,
in Cisco you have a "sh tech" command that we can actually analyze - does Mikrotik have any similar commands? I'm a newbie with Mikrotik and I was hoping I could check something out of the normal "log" files in Mikrotik that would somehow give me a clue as to what is causing or being a trigger to the sudden and random "down" of ospf and bgp?
thanks.
Re: OSPF and BGP Issues
Posted: Fri Jun 16, 2017 3:07 pm
by StubArea51
sir,
in Cisco you have a "sh tech" command that we can actually analyze - does Mikrotik have any similar commands? I'm a newbie with Mikrotik and I was hoping I could check something out of the normal "log" files in Mikrotik that would somehow give me a clue as to what is causing or being a trigger to the sudden and random "down" of ospf and bgp?
thanks.
supout.rif is the equivalent of a show tech in the Cisco world. You can log into your account and view the contents as well as send it into MikroTik with a ticket.
https://wiki.mikrotik.com/wiki/Manual:S ... utput_File
Re: OSPF and BGP Issues
Posted: Fri Jun 16, 2017 3:10 pm
by StubArea51
As far as RouterOS version, I advise all of my clients to run bigfix code as it is much more stable in production. One other practice that can contribute to OSPF/BGP instability is running a lot of mismatched versions on the routers. 6.37.5 bugfix has worked well for a lot of our clients that depend on BGP/OSPF.
Re: OSPF and BGP Issues
Posted: Mon Jun 19, 2017 1:55 pm
by techmngr
As far as RouterOS version, I advise all of my clients to run bigfix code as it is much more stable in production. One other practice that can contribute to OSPF/BGP instability is running a lot of mismatched versions on the routers. 6.37.5 bugfix has worked well for a lot of our clients that depend on BGP/OSPF.
Hi Sir..thanks for your response..so far I've upgraded both my 1072s to 6.37.5, will try to upgrade 2 x 1036s this weekend to the same bug fix version. I was able to generate a supout.rif file and was able to open it via the supout viewer. My question would be when would be the best time to do generate the file - right after an unusual behavior is encountered? The log files are deleted every time you reboot the router and in my case, a reboot is done to resolve the issue - temporarily that is. I guess I was hoping to have a means of finding out what triggers the behavior. Thanks again.
Re: OSPF and BGP Issues
Posted: Mon Jun 19, 2017 2:00 pm
by techmngr
Those errors are exactly what I see on CCR1009/1016 in the access network when the wireless links cause the neighbours to drop. On one side the neighbour comes up in 'Full' state, but the other cycles through the OSPF FSM in the way you've shown. I have to reboot the one that thinks it's Full to bring the neighbours back up correctly. As you don't have wireless links I can't say why it might be happening, but the symptoms seem identical.
Again, no config changes on our CCRs before this happens. If the wireless signals are tuned to a strong level, the problem disappears. Suggests to me the cause is a bad link, but the CCR must have a bug somewhere that stops OSPF from forming correctly again. I've tried using different OSPF link types - broadcast, nbma, ptp, ptmp - non of them have solved the issue. I've reverted to a script to automatically reboot the router that thinks it's 'Full', but in the core/edge I don't see how you could do this.
Thank you sir..though we don't have any wireless features enabled on both 1072s. Last I did was to delete files on my HDD since I've also noticed it has reached 80% utilization, that gives me 20% free space on my HDD. Could it be a factor? I mean will an 80% utilization on my HDD probably cause the router to hang or stop working? I mean as I've notice every time it happens, uptime doesn't really reset so technically router is still UP, it's only my BGP and OSPF neighbors that break and recover after the reboot.
Re: OSPF and BGP Issues
Posted: Mon Nov 20, 2017 11:56 am
by Kevo
We've seen this problem a couple of times now on our 1072. It looks like something happens with OSPF and then a little while after we get the hold timer error with BGP and the routing fails. After some minutes bgp will come back up. We've only run bugfix releases and this has happened before on 6.37.5 and now on 6.39.3.
Log shows
862 Nov/19/2017 20:42:59 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from ExStart to Down
863 Nov/19/2017 20:43:17 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from ExStart to Down
864 Nov/19/2017 20:43:52 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from Exchange to Down
865 Nov/19/2017 20:44:57 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from ExStart to Down
866 Nov/19/2017 20:45:53 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from Init to Down
867 Nov/19/2017 20:46:55 memory route, bgp, error HoldTimer expired
868 Nov/19/2017 20:46:55 memory route, bgp, error RemoteAddress=111.222.111.123
869 Nov/19/2017 20:47:26 memory route, bgp, info Connection opened by remote host
870 Nov/19/2017 20:47:26 memory route, bgp, info RemoteAddress=111.222.111.123
Is there any way to troubleshoot this when it happens again. I think it's been a few months since it happened last.
Re: OSPF and BGP Issues
Posted: Thu Apr 30, 2020 1:24 am
by jmatuska
Did anyone find a fix to this issue? We are having the same problem with OSPF neighbors dropping approximately every 12 hours which then causes our BGP peer hold timer to expire and then shortly the OSPF neighbors and BGP peer come back up. This is occurring on a CCR1072-1G-8S+ with version 6.45.8.
Any thoughts?
Re: OSPF and BGP Issues
Posted: Wed Jun 15, 2022 7:31 pm
by Leonardorortizm
Same problem, any updates? and the same router 1072
Re: OSPF and BGP Issues
Posted: Tue Jun 21, 2022 3:19 pm
by Leonardorortizm
I has solved the issue, the problem was that the router has receiving an high amount of updates by BGP protocol (about 2 Millions) , the router can't manage this and them kill the BGP process, BGP and OSPF runs under the same process, for this reason all is down when occurs.
the solve was to talk with internet provider and request that only sends to me the default gateway.
I'm not convenced that Mikrotik 1072 could manage Full Routing BGP. Support has said that update to RouterOs V7 could solve the initial issue.
Re: OSPF and BGP Issues
Posted: Tue Jun 21, 2022 3:47 pm
by mrz
In ROS v7 BGP and OSPF are separate processes, so that may improve things a bit.
Re: OSPF and BGP Issues
Posted: Tue Jun 21, 2022 5:01 pm
by StubArea51
In ROS v7 BGP and OSPF are separate processes, so that may improve things a bit.
Is there any internal prioritization for the routing processes in ROSv7 so they remain stable under high CPU load?
Cisco and Juniper apply a DSCP marking to routing protocols to keep them prioritized as well as giving them process priority in the control plane.
Re: OSPF and BGP Issues
Posted: Tue Jun 21, 2022 5:20 pm
by mrz
RouterOS does not have specific traffic prioritisation scheme by default, it is up to you to apply dscp and set up queues.
As for routing process prioritisation, there are no user configurable options except affinity setting to divide protocols into multiple processes. Process itself will always try to "jump" to the core with lowest load.
Re: OSPF and BGP Issues
Posted: Mon Mar 27, 2023 9:45 pm
by peakwifi
Good day all,
We have several CCR1072 units connected to ATT fiber using BGP. We have asked them for default route only yet suspect they occasionally blast us with more routes causing a router reboot every 5 minutes. We are running 6.48.6 with the following filters setup, is there any way to further protect the routers and prevent these reboots?
/routing filter
add action=accept chain=ATT-IN prefix=0.0.0.0/0
add action=discard chain=ATT-IN
add action=accept chain=ATT-OUT comment="Site x - Shared Outbound" prefix=34.165.21.0/24
add action=accept chain=ATT-OUT comment="Site x" prefix=34.165.20.0/24 set-bgp-communities=""
add action=accept chain=ATT-OUT comment="Site x" prefix=34.165.22.0/24 set-bgp-prepend=5
add action=discard chain=ATT-OUT
Thanks