With standart routing-test on 2.9.25 have problem with
hold time expire. If i stay to only one ISP - no problems.
If i stay with two ISP`s - bgp sessions go somwhere faaaaar away ..
and reseting again and again ...sounds like my "dream" is to stay without internet connections
Thanks again for NEW STABLE 2.9 with suggested routing-test
I`m still wait for other problems
Another one night i can`t sleep with mikrotik in trouble ...
Thank you guys !! May be next time you want from me to paid for this "boogie nights"
appears so, when mentioning problems, he says `standard test`.You mean the new routing-test is very good ?
i have downloaded all_packages, next...I checked nikhil's supout.rif and he is using the standard routing package, not test
sorry for this post, but i must explain to nikhil.
when saw this:
i have downloaded all_packages, next...I checked nikhil's supout.rif and he is using the standard routing package, not test
unzip and go to directory contain *.npk files.
delete routing.npk (this is normal package comming with packages)
then download to directory routing-test-2.9.25.npk - link provided by normis in topic.
then ftp to your router - set binary type transfer, upload packages and reboot.
and working routing test is up. next step is to configure BGP.
P.S. on my working router have only PCI-X slots and have one
dual port intel gigabit. other is 2 x 1 port intel gigabit and works ...
if your link to other hardware is 100 mbit/s , set gigabit adapters to propre 100 mbit/s full duplex (don`t use auto negotiate links)
this useful tip is from my experience with 100 mbit/s media converters...
Cheers
DON`T USE WINBOX routing menu. telnet or ssh to router and go ...
if you have troubles with configuration, i`m around ...may help you.
Thats my way ... and works ..
Yes I am using that onlynikhil , did you get routing-test from topic ???
in this version from normis - network command is back again (thats make me happy )
nikhil, in terminal :
when you press double Tab you got "predict" for commands,
that mean you must
enable 0
add network=xxx.xxx.xxx.xxx/xx disabled=no
and you have network advertised from your router
I had everything OK for the whole weekend and today my router went crazy and I had to disable all BGP peers and leave only uplink to announce my prefixesi use package from topic - for last 12 hours everything is OK!!!
4 BGP sessions work great for now.
Thanks from me for fix and for "network" command too.
Please test it for 7days or soi use package from topic - for last 12 hours everything is OK!!!
4 BGP sessions work great for now.
Thanks from me for fix and for "network" command too.
nikhill for testing purposes i have another one router with different hardware (intel motherboard, with intel network adapters, intel cpu
- that i call "Trinity" :) )
What kind of hardware use for your routers ?
I have working router with 2.9.25 and new routing-test from topic only.
And this variant works for me.
Thing twice and then go ahead again... last routing-test must works.
I am still seeing 100% cpu after sync sometimes as well, but not all the time. We have a development lab setup with 2 bgp peers, 2 mt routers, and the 2 routeros boxes using iBGP between them. The routers were configured directly after a /system reset. The latest routing-test works 99% better but still has a lingering cpu problem. I am sure it will be fixed (I hope) and we will go into production when it is. I saw a huge improvement in the last 2-3 updates that came out.
Nikhil, are you doing iBGP peers as well?
Sam
For last 30 hours i get only one error :
Hold timer expired 40 minutes after midnight, then all brp peers was restarted, 30 seconds router has messages in log:
Failed to open TCP connection. Operation now in progress
RemoteAddr=peer1 ip
RemotePort=179
Failed to open TCP connection. Operation now in progress
RemoteAddr=peer2 ip
RemotePort=179
Failed to open TCP connection. Operation now in progress
RemoteAddr=peer3 ip
RemotePort=179
Failed to open TCP connection. Operation now in progress
RemoteAddr=peer4 ip
RemotePort=179
then full bgp tables loaded and everything is ok ...
no one knows for this case with hold timer, i see it in this morning when check logs regular with start of work time...
router is in production environment that mean :
connected with 2 ISPs, 4 BGP peers over 4 VLANs
networks are advertised, all is fully operational.
for now ...
and let the force be with you...
Yeah right, I already sent the supout.rif 2.9.26 w/r-t BGP problemsSend a support-output file to support@mikrotik.com
I have similar problems. I use Mikrotik in production (currently 2.9.26) but I have only one BGP full feed and some peerings. When I have discarded all routes from uplink and used default route, situation is rather stable.How many people actually have BGP working in production with > 2.9.14 routing-test? Of those how many are using more than 1 peer? We've been testing for months and can't get anything working reliably and I'm close to giving up on it.
Yes, MikroTik BGP is not ready for "prime time" yet. This past weekend we upgraded a four core routers to v2.9.27 with routing-test. I can only describe the experience as the most unpleasant one I have ever had with MikroTik.We have 2 routers in production with 2.9.27 bgp routing-test but cannot accept many incoming routes. We are doing ibgp between routers and annoucing our routes, but that's all... incoming routes from the peers cause things go haywire. I think iBgp between the 2 is where things get hairy. So far 10 days uptime though when filtering out all routers from peers.
Hello Eugene,
The "ghost routes" you mention have peaked my interest.I believe that we came to a stage where troubleshooting for MT routing-test has evolved to a full time job and a really exiting knowledge quest that you get for a few bobs…
More to come, ghost routes resulting to invalid AS_PATH attributes, million of routes on a peer, filters for every attribute, BGP cpu utilization with timers 1/3, routers to a stand still with bgp debug mode ON, so on and so forth…
As soon as we have all the info collected we will try and post to support…
This is a problem that can't really be solved - pumping out that much debug in a production enviroment would kill most routers. I would never leave bgp or related debug rules on while taking in 180k routes, there is too much to log and stay productive. I would LOVE to see burst option on debugging rules to help with this.routers to a stand still with bgp debug mode ON
quagga documentation is easy to follow given their sample configuration bgpd.conf.sampleadvantz, mikrotik people must think before to give us some features...
i`m realy disapointed from repeatin problems ... or fix something and crash something other ...
Can i ask you to send to me conf files for quagga without you sensitive information. I have some PC here for testing purposes, and think to install bsd and quagga. Can you help ? WIth confs ?
Personally, I use default hold timer settings. To be on the safe side, I limit incoming prefixes to providers' routes only (as-path-length=2), although I am not implying that the full table (or multiple full tables do not work)Hello Eugene,
Yes, I know we should have thought to generate a support-output file. Even with three people working on the problem nobody was thinking of sending supout files...instead our pants were on fire and we were only thinking of getting the network running smoothly again!
Do you believe we were seeing problems because we were trying to take two full tables? Does MikroTik have any BGP recommendations? e.g. take full routes or only default & connected, hold timer settings, etc. etc...
Any input as to why it takes so long for Winbox to show the routes? Is this a CPU problem? Should we be looking at faster CPUs? Is 512MB RAM enough? How much memory does MikroTik recommend?
Best,
Brad
Yes, Sam, we are using these 2 peers (great thanks, btw). No need to do anything additional on your side as we are now able to do pretty extensive testing ourself.Eugene,
Are you still using those 2 test peering here? If so, maybe I can script something up to add/remove routes similiar to that on the internet - maybe taking in 180,000 routes on 2 routers and syncing them between using ibgp works fine until there are new annoucements during that sync. We had the same problems as BelWave and had to filter the incoming routes to get stable. Now that it's in productions its hard to do any testing with those boxes : )
Sam
You have to consider that by default next hops for bgp routes are not looked up through static routes. Search the manual for "scope" and "target-scope" parameters of routes.Yes, of cause. At first we added static route for that next hop and test by ping, but MT didn't put routing information without filtering input BGP with 'set-nexthop' option.
In another BGP routing daemons option "multihop" complete with field 'hope count' (numeric) - if two routers have more then one route to each other. Do You plan to add this option field?
Eugene,Yes, Sam, we are using these 2 peers (great thanks, btw). No need to do anything additional on your side as we are now able to do pretty extensive testing ourself.
/routing filters exportExcuse for my other stupid question… I’ve put a static route with scope<=target-scope and next-hope resolved, all routes from multihop neighbor were put in routing table without any filtering.
But now I have other trouble – MT redistribute my RIP networks trough BGP. That multihop neighbor received routes from MT, but in that list next-hop variable isn’t MT. As next-hop in this list I see RIP routers… I tried to put outgoing pass-through filter for that peer with next-hop variable set to MT ip – but nothing happened…
When I put route-map with next-hop variable set to MT ip on multihop neighbor - “zebra” rebooted immediately. May be filters with next-hop variable must not work on multihop or I can’t set next-hop variable to remote ip address of BGP router?
/routing filter add chain=foo prefix-length=17-32
7 ADb dst-address=10.10.0.0/16 gateway=10.0.0.61
interface=l2tp-to-cip-office gateway-state=reachable
distance=20 scope=255 target-scope=10
bgp-as-path=65505 bgp-origin=incomplete
8 Db dst-address=10.10.0.0/16 gateway=10.0.0.65
interface=l2tp-amistad-to-delmar gateway-state=reachable
distance=20 scope=255 target-scope=10
bgp-as-path=65507,65505 bgp-origin=incomplete
19 Db dst-address=10.40.4.0/24 gateway=10.0.0.61
interface=l2tp-to-cip-office gateway-state=reachable
distance=20 scope=255 target-scope=10
bgp-as-path=65505 bgp-origin=incomplete
20 ADb dst-address=10.40.4.0/24 gateway=10.0.0.65
interface=l2tp-amistad-to-delmar gateway-state=reachable
distance=20 scope=255 target-scope=10
bgp-as-path=65507,65505 bgp-origin=incomplete
Software upgrades may help performance, but expecting customers to perform multiple firmware or driver updates to reach minimal functionality is completely unacceptable. So is releasing immature products just to be early to market and treating purchasers as your quality-assurance department. In the end, that hurts both consumers and vendors.
It is not set on any instances ...Do you have /routing bgp instance <number> ignore-as-path-len set to "no"?
Router ospf 100
summary-address 10.12.1.0 255.255.255.224
[user@cip-office] ip route> print routing-mark=bogons terse
Flags: X - disabled, A - active, D - dynamic,
C - connect, S - static, r - rip, b - bgp, o - ospf
0 Db dst-address=1.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
1 Db dst-address=2.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
2 Db dst-address=5.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
3 Db dst-address=7.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
4 Db dst-address=10.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
5 S dst-address=10.40.0.1/32 gateway=10.40.1.1 interface=0-inside gateway-
state=reachable
distance=1 scope=255 target-scope=10 routing-mark=bogons
6 Db dst-address=23.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
7 Db dst-address=27.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
8 Db dst-address=31.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
9 Db dst-address=36.0.0.0/8 gateway=10.40.0.1 interface=9-loopback gateway-
state=reachable
distance=20 scope=255 target-scope=10 routing-mark=bogons bgp-as-
path=65333
bgp-med=0 bgp-origin=igp
Eugene,There are numerous routing fixes in .28. Could you switch on those bgp peers?
mmm thi explains a problem i have here... i've a full bgp setup, with confederations.2Sam: If these 2 routes are from different instances, they are not compared by BGP code (AS_PATH length does not matter).
Ok, this is fine.. the point is : i use multiple instances because i seem to be unable to propagate bgp routes bewteen routers withous sessions.Multiple peer entries are okay, but you cannot use more than 1 instance. If you use more than 1 instance you end up with multiple views of the routing table that do not 'see' other instances ... it's a problem and should not be that way. BGP instances are there to allow different router IDs, redistribute settings, AS numbers, but not to separate the routing table.
You can 'filter' the routes to set-nexthop= just like in the cisco. If they are getting ignored it might be a config issue with the peer and its filter chain. We have quite a few chains that perform set-nexthop and they seem to work fine. If you need help getting it to work post some configs.with "usual" bgp stuff, like cisco o quagga i do solve this by using nexthop... with mt seems to be ignored..
/ routing filter
add chain=next invert-match=no action=passthrough set-nexthop=80.79.50.206 \
comment="" disabled=no
/ routing bgp instance
set default name="default" as=65048 router-id=80.79.49.217 \
redistribute-static=yes redistribute-connected=yes redistribute-rip=no \
redistribute-ospf=no redistribute-other-bgp=yes out-filter="" \
confederation=34695 confederation-peers=65000 \
client-to-client-reflection=no comment="" disabled=no
/ routing bgp peer
add name="peer1" instance=default remote-address=80.79.50.206 remote-as=65000 \
tcp-md5-key="" multihop=no route-reflect=no hold-time=3m ttl=1 \
in-filter=next out-filter="" comment="" disabled=no
#
/ routing filter
add chain=next invert-match=no action=passthrough set-nexthop=80.79.50.206 \
comment="" disabled=no
set default name="default" as=65000 router-id=80.79.49.121 \
redistribute-static=yes redistribute-connected=yes redistribute-rip=no \
redistribute-ospf=no redistribute-other-bgp=yes out-filter="" \
confederation=34695 confederation-peers=65001,65048,65000 \
client-to-client-reflection=no comment="" disabled=no
/ routing bgp peer
add name="peer1" instance=default remote-address=80.79.50.121 remote-as=65001 \
tcp-md5-key="" multihop=no route-reflect=no hold-time=3m ttl=1 \
in-filter="" out-filter="" comment="" disabled=no
add name="peer2" instance=default remote-address=80.79.50.205 remote-as=65048 \
tcp-md5-key="" multihop=no route-reflect=no hold-time=3m ttl=1 \
in-filter="" out-filter=next comment="" disabled=no
I set it on the incoming filter only usually.Is "set-nexthop=80.79.50.206" supposed to be on both routers filters, or was that a copy/paste issue?
You want to set-nexthop as they come in, not as they go out. Maybe this is why they are getting the following:
packet nexthop=<missing> weight=0 address=80.79.50.121
This behavior is normal,it is not a bug. AFAIK, zebra and cisco work the same way.Multiple peer entries are okay, but you cannot use more than 1 instance. If you use more than 1 instance you end up with multiple views of the routing table that do not 'see' other instances ... it's a problem and should not be that way.
...
Sam
:for x from 0 to 50 do={
/routing filter add chain=length-filter bgp-as-path-length=$x \
set-distance=($x + 200) action=passthrough
}
Personally do not mind if it's a bug or not, i'd just like to have an example of a working iBGP configuration with more thatn three peers, using confederation instead of a full mesh (i cannt scale on that, and of course if there's a full mesh all routes propagate).This behavior is normal,it is not a bug. AFAIK, zebra and cisco work the same way.Multiple peer entries are okay, but you cannot use more than 1 instance. If you use more than 1 instance you end up with multiple views of the routing table that do not 'see' other instances ... it's a problem and should not be that way.
...
Sam
Routes from multiple BGP processes are compared by kernel code.
Eugene
How is v2.9.28 working for BGP & OSPF? Any surprises? Are routes being added and removed properly via OSPF now?Seems 2.9.28 has the answer, with the force-nexthop option...
Bye,
Ricky
Can't tell yet... i'm still testing 2.9.28 before putting it in production..How is v2.9.28 working for BGP & OSPF? Any surprises? Are routes being added and removed properly via OSPF now?Seems 2.9.28 has the answer, with the force-nexthop option...
Bye,
Ricky
Is the problem where a AS route is clearly displayed in the route table, but not working until disabled/enabled fixed?
Best,
Brad
After some testing i have upgraded some routers to 2.9.28
Can't tell yet... i'm still testing 2.9.28 before putting it in production..
So far, on three devices, no problems so far(i can' tell you anything on OSPF though)
Bye,
Ricky
Maybe this happens because you are receiving a route that is conflicting with a route to the other peer? If you learn a new route to your peer and accept it possibly it can't stay connected (async routing ?) or something. I am thinking that creating a routing filter chain that included all routes you do not want (your own) and then filter them on the incoming. - like default routes... To test this just setup a filter that marks all incoming routes with a routing-mark (or reject the route?), then look in that table to see what it received and see if there are any routes that might cause problems.After some testing i have upgraded some routers to 2.9.28
Can't tell yet... i'm still testing 2.9.28 before putting it in production..
So far, on three devices, no problems so far(i can' tell you anything on OSPF though)
Bye,
Ricky
So far no problems, i'm finally able to have multiple paths to the two sides of my network.
The only problem i had was on one of the routers acting this way:
A --------B------- C
| |
----------D--------
When i enabled A and C peering to D the peering session started to disappear.. i.e. /routing bgp print started show nothing, and things started to mess up..
If i removed either A or C peering all was fine.. i re-enabled both peering, but filtered the prefix (now i get around 30 prefixes from each side, instead of around 800).
It seems to be stable... i use 32 megs RB532 ..
I it keeps to run stable i'll try to increase the received prefixes...
Bye,
Ricky
I don't really get what you mean for "conflicting routes".
Maybe this happens because you are receiving a route that is conflicting with a route to the other peer? If you learn a new route to your peer and accept it possibly it can't stay connected (async routing ?) or something.
Well actually i DO want my own routes.. the only point of the whole thing is to have multiple paths within my network to avoid failures if a point of the network goes down..I am thinking that creating a routing filter chain that included all routes you do not want (your own) and then filter them on the incoming. - like default routes... To test this just setup a filter that marks all incoming routes with a routing-mark (or reject the route?), then look in that table to see what it received and see if there are any routes that might cause problems.
Sam
I hrdly think it's a memory ,matter.. really, it does hang up this way even with just 70 or so prefix (which is MUCH less than it was i handled usually).what does the memory situation on that 532 look like? Maybe its running low on mem and losing its brain? I've seen that happen even with 5-10mb free ...
Sam
It seems no =( now f the two boxes that had the problem one has no more, the second one still has it..I'm doing some tests and it MAY be related to a couple of things:
The problem seems NOT to apper if the peers use an instance that is NOT the defult one
The problem seems to apper if the router ID is set and is an ip assigned to a point to point wireless link
Bye,
Ricky
So, has anyone upgraded to v2.9.29 yet? Who is willing to test the waters first? <grin>We upgraded our networks to 2.9.28 (about 110 routerboards) without problems. But now we see that ospf-out chain is not working. All routers have filter for ospf-in and ospf-out. I install one new today and I forget to set filter. I was shocked if I saw routing table, ospf-out chain is totally not working. Working well for us before upgrade... Can anyone help please ???
After some test 2.9.29 still gave me a bad week.done... i have it on a couple of MT since yesterday.. seems no problems so far, i can tell you for sure the memory leak is not there anymoreSo, has anyone upgraded to v2.9.29 yet? Who is willing to test the waters first? <grin>
Bye,
Ricky
root@ns:~# tracepath 10.87.183.129
1: ns.spirosco.awmn (10.17.119.130) 0.400ms pmtu 1500
1: ns2.spirosco.awmn (10.17.119.129) 0.612ms
2: gw-spirosco.sw1hfq.awmn (10.17.119.198) 1.182ms
3: gw-sw1hfq.viper7gr.awmn (10.17.127.98) 4.142ms
4: gw-tenorism.vlsi.awmn (10.17.122.173) asymm 3 5.100ms
5: ns2.tenorism.awmn (10.87.183.129) asymm 2 4.621ms rea ched
Resume: pmtu 1500 hops 5 back 2
root@ns:~# tracepath 10.87.183.129
1: ns.spirosco.awmn (10.17.119.130) 0.276ms pmtu 1500
1: ns2.spirosco.awmn (10.17.119.129) 0.657ms
2: ns2.tenorism.awmn (10.87.183.129) 1.094ms reached
Resume: pmtu 1500 hops 2 back 2
please send the support output file from the router to support@mikrotik.comOspf in routing test in .28 .29 is very broken.
Also in - lets say - stable .27 ospf uses much more cpu than regular routing package, as much more as RB112 cant handle ospf (30-40 routers, 1000 /32 routes) with test package.
Hi, next you have the crash (i hope not =) can you please check if th problem is similar to mine?Well, i cathed a bgp crash on my router just few moments ago.
After the bgp crash guess what...my router have stopped to send my prefix in almost all of the bgp neighbors.
Only by disabling/enabling each of my bgp peer sessions, the problem has solved temporary i think.
All other bgp prefixes was exchanged normally.
I have allready send the precius supout. Please fix it guys
I see.. my network is around 750/800 prefixes on RB532karyal, no, visually all things look fine, bgp states are established, and routes are exchanged normally (as far i can tell).
There's no visual warning for what it happens, except if you look at Files for an autosupout.rif.
Thats my case were all the routers are classic pc's with a minimum of 256MB ram, and about 400 bgp prefixes.
Thanks.. i would like to ask you a couple of things, there are some situations in your network i'm unable to let work as should on mine and i would lik to understand where i'm wrong (or if with a slight change of my setuo, like adding a quagga box on the internal network too i can solve it)...You can see our network physical topology here: http://nagios.awmn.net/cgi-bin/statusma ... factor=1.0
Username/Password: awmn/awmn
We are using mainly pc's. Every network node has his own AS and they are speaking each other eBGP only.
There are some nodes with 2 or 3 routers that playing iBGP with OSPF, or they do all the routing stuff in a linux/quagga pc
and they use Mikrotik only for the wireless part (bridge).