Completely agree. We are wanting to deploy our first 100% Mikrotik L3VPN network using the current CCR as well as the new -2S+ model but until these issues are resolved we cannot.This appears to be ongoing for some time, what kind of timeline does Mikrotik have on a working MPLS VPN given they are pushing "Cloud Core Routers" ? I'm used to Cisco who would consider this a serious issue and would not push a new release out (citing a bunch of fixes) whilst such a fundamental aspect of the code remains accessible to the user, yet non functional due to immaturity?
Well, it may or may not be the same issue, but bearmeister's description of his problem fits exactly with what I have discovered. A 'poke' in the routing table makes things work again, but only for a short while.Hi All,
When I apply the config to a new router straight from copy & paste, everything comes up as expected. OSPF exchanges Loopbacks, BGP comes up, label distribution is good, MPLS forwarding table is good. BGP Prefix to Label association is correct and the paths are working. The device on LAN1 IP 192.168.26.150 can ping device on LAN2 IP 192.168.127.150 and the world is a happy place. A no brainer !
However, after an arbitrary period of time the pings between the two computers will stop. The tables on the Mikrotik are unchanged with respect to routes and labels. Debug log shows nothing untoward at the moment the ping stops dead in the water. The most time I have got this working is 965 pings, but almost always this dies after 5-20 successful pings.
Like you, changing anything to do with the route table causes the MPLS VPN to start working again, even though the change has no relationship to the traffic that is flowing through the router. For example, I add a Loopback2256 into VRF2256 on a PE. Adding a loopback in the VRF on the PE has no bearing on the existing labels and prefixes relating to the LAN-LAN PC's pinging each other, there is no visible change to any tables, yet MPLS will start working again, briefly. When it stops, you can do a similar thing, anything that pokes the routing table appears to cause it to start working again momentarily. It's like the MPLS table is saying it is doing what it should, but it isn't.
/interface bridge
add name=lo0
/ip address
add address=1.1.1.1/32 interface=lo0 network=1.1.1.1
add address=10.1.1.1/24 interface=ether1 network=10.1.1.0
add address=192.168.1.1/24 interface=ether2 network=192.168.1.0
/ip route vrf
add export-route-targets=1:1 import-route-targets=1:1 interfaces=ether2 \
route-distinguisher=1:1 routing-mark=vrf1_1
/mpls interface
set [ find default=yes ] mpls-mtu=1500
/mpls ldp
set enabled=yes lsr-id=1.1.1.1 transport-address=1.1.1.1
/mpls ldp interface
add interface=ether1
/routing bgp instance vrf
add redistribute-connected=yes routing-mark=vrf1_1
/routing bgp peer
add address-families=vpnv4 name=R2 remote-address=2.2.2.2 remote-as=65530 \
update-source=1.1.1.1
/routing ospf interface
add interface=lo0 passive=yes
/routing ospf network
add area=backbone network=10.1.1.0/24
add area=backbone network=1.1.1.1/32
/system identity
set name=R1
/interface bridge
add name=lo0
/ip address
add address=2.2.2.2/32 interface=lo0 network=2.2.2.2
add address=10.1.1.2/24 interface=ether1 network=10.1.1.0
add address=192.168.2.1/24 interface=ether2 network=192.168.2.0
/ip route vrf
add export-route-targets=1:1 import-route-targets=1:1 interfaces=ether2 \
route-distinguisher=1:1 routing-mark=vrf1_1
/mpls interface
set [ find default=yes ] mpls-mtu=1500
/mpls ldp
set enabled=yes lsr-id=2.2.2.2 transport-address=2.2.2.2
/mpls ldp interface
add interface=ether1
/routing bgp instance vrf
add redistribute-connected=yes routing-mark=vrf1_1
/routing bgp peer
add address-families=vpnv4 name=R1 remote-address=1.1.1.1 remote-as=65530 \
update-source=2.2.2.2
/routing ospf interface
add interface=lo0 passive=yes
/routing ospf network
add area=backbone network=10.1.1.0/24
add area=backbone network=2.2.2.2/32
/system identity
set name=R2
I think it depends on:Is it normal that MT does not reply to support mails ? Was my first request, just wondering.
I agree that we speak of two different issues, I have discovered those stale routes too. I have also opened a case about that, I get those when I have two routers actively redistributing the same prefix.Norpan I think your issue is different from my primary issue. The issue I have is stale routes within a VRF, e.g. a withdraw is received by the PE router but the route is never actually withdrawn from the FIB. This bug has been in RouterOS since at least 5.0rc's.
In my lab I have 6.1 as a route originator, and 5.12 as well as 6.0 and 6.1 as PE devices and the issue occurs on all 3 versions.
My last post was regarding Crami's issues, which could be related to what I have been troubleshooting the last couple of days.Well, i still don't think that is the case, if you look at this, both is from
the
same router (2.2.2.2):
## BGP ###
VPNv4 ROUTES
Flags: L - label-present
0 L route-distinguisher=1:1 dst-address=172.16.1.0/24 interface=ether2
in-label=18 bgp-ext-communities="RT:1:1"
Here you only have the local route in BGP table, ok?
## IP route ###
ROUTE
5
Flags: X - disabled, A - active, D - dynamic,
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
0 ADC dst-address=172.16.1.0/24 pref-src=172.16.1.2 gateway=ether2
gateway-status=ether2 reachable distance=0 scope=10 routing-mark=vrf
1 Db dst-address=172.16.1.0/24 gateway=1.1.1.1
gateway-status=1.1.1.1 recursive via 10.1.1.1 ether1 distance=200
scope=40 target-scope=30 routing-mark=vrf bgp-local-pref=100
bgp-origin=incomplete bgp-ext-communities="RT:1:1"
2 ADo dst-address=1.1.1.1/32 gateway=10.1.1.1
gateway-status=10.1.1.1 reachable via ether1 distance=110 scope=20
target-scope=10 ospf-metric=20 ospf-type=intra-area
3 ADC dst-address=2.2.2.2/32 pref-src=2.2.2.2 gateway=lo0
gateway-status=lo0 reachable distance=0 scope=10
4 ADC dst-address=10.1.1.0/24 pref-src=10.1.1.2 gateway=ether1
gateway-status=ether1 reachable distance=0 scope=10
But here you have a BGP route where 1.1.1.1 is gateway, and 1.1.1.1 has stopped
redistributing any routes.
It's not active, but still left over in IP-tables.
My experience is that L3VPN on RouterOS has been unusable for much longer. I tried it in 5.0rc, 5.12, 5.16 and then gave up since Mikrotik said it would be fixed in the "new routing". I recently needed L3VPN again and started testing on 6.0 then 6.1 and now 6.2 and have encountered the same issue with stale routes on all of those releases. Maybe that is different from the problem you have, as I cannot see if you are just using VRF, or if you are using L3VPN as well.this has been known for a while. I first posted about this I 2012 :
The Vrf, has been unstable in all releases of 6.x
The situation is the same for us. Unfortunately L3VPN does not work on v5 either or we could run it on x86.Really bad info. Since we cannot start the rollout of CCR's before the vrf issue is solved in 6.x. (works like a charm in all 5.x).
For me the last working version is rc13, with everything after that the vrf stops within minutes at most, tried latest build of v6.2 yesterday without success.I have not seen the problems with route leaking (I don't leak routes), but the biggest issue, is that once I put the vrf config to /ip route, the CPU goes to 100%. Then it not even possible to make a supout. If you se my old post's the routers goes crasy.
It's really a pain in the ass, since its impossible to use CCR's in places I want a vrf. (typical at CPE). I can use in core net, since the MPLS/VPLS, BGP and OSPF work, as long as I don't want to have a IP from a vrf to a interface. It still forward the routingtable by the ibgp
It seems like rc6.12 is working. (but then with its other problem)
I agree 100% on all points raised.It's actually quite disappointing that this "feature" is available, but is buggy and has been for a long time, and is not even considered important enough to address. It will be fixed in the "new routing"? I'm not holding my breath. It hasn't been fixed for how many years now?
On the positive side, mpls/bgp/l2vpn works just fine. Doesn't fix this issue though.
At least its not 100% in CPU, and the router's have now about 2 days uptime. In pre 6.5 (exept 6.rc13) there was about 2-3 min uptime before cpu goes 100%.I also have good results so far with 6.5rc1, but I'll give it some more time before I go and say that it's fixed.
It is stable on our network. Thanks Mikrotik, this should tide us through until v7Is it long enough to be considered 'more time' now? How's it running?
Hi crtee,Hi there,
sorry to bring this up again, haven't been in touch with this recently, but are L3VPNs really safe so far? Are there still any known stability problems or did they finally manage to fix it?
Thanks for your reply. I'm already at 6.9, however I'll give it a try and post my results here.Hi crtee,
I can say they are stable on 6.5 as that is what we have running in production. Any newer version YMMV.
[...]
Given the recent dramas with RouterOS stability I am scared to even attempt an upgrade past 6.5 for fear that it will break something.
Okay, my conclusion so far: don't try to run L3VPN and a full IPv4 BGP table on the same box. The routing process is unable to keep up with everything, maybe unless you take some top-end Xeon box, put it in liquid nitrogen and crank the clock beyond 5 Ghz.Thanks for your reply. I'm already at 6.9, however I'll give it a try and post my results here.
Mikrotik have not confirmed when "new routing" will arrive. I have been hearing about it from them for about 3 years now, but still have not seen it.Waiting for 7.0 and the "new routing engine"
Mikrotik have not confirmed when "new routing" will arrive. I have been hearing about it from them for about 3 years now, but still have not seen it.Waiting for 7.0 and the "new routing engine"
Hopefully it does come in v7 and we see it soon!
There are a number of issues on v4/5/6 that are affecting us
- No RIPv2 from VRF's so cannot use it for PE-CE
- Cannot view BGP advertisements sent to/received from a peer when running PE-CE from a VRF
- Cannot view L2VPN information sent/received from a BGP peer
- Cannot view L3VPN information sent/received from a BGP peer
- BGP is not multi-core optimized making complex filters, full tables and large updates very very slow
- A lot of configuration is still CLI only, e.g. BGP VRF out-filters
- Config database sync issues, often the running config is different from what is shown in Winbox/CLI.
- VPLS tunnel state changes are not logged
- Can not specify which VRF router management services are available from, e.g. WinBox, SSH, Webfig
- Cannot specify which VRF PPP connections terminate to
The issue we have is that there is no working equivalent of the IOS/JunOS commands:I still cannot se exactly why CE Devices want to know about all vrf's and routing's in MPLS cloud. It should know about it own routes, and Gateways. If running a bgp outside the MPLS, but inside the vrf, CE should still have the full Routing table that is usful to Customer, and would update the vrf With internal "own" subnets/routes. I just let CE know about the default (Routing table) via ospf (if CE should have l2VPN) and just running a bgp, without internal confederation and l3VPN at last point (PE-CE) (without default Routing table). This also make the problem about what vrf to run services to to dissappere. Its a bit annoing to not have managment to Devices, but its possible to have a link net to just have mgmt, and not run full ospf/bgp/MPLS to just do managment. Usualy my vrf's is Public IP's from different ISP's and my default via ospf, is rfc1918 for ospf/MPLS/loopback. Bgp via vpn4, l2vpn-cisco and confederation inside different AS set. All MPLS routers would have its own internal AS in the confederation and produce the non agregated table for AS set that peer With global transits.
Yes this problem is extremely annoying, we too are still experiencing this issue. Connect a CCR to a Cisco device and set the port at 100mbit FDX, after a random period of time (from within hours, to 6 weeks) connectivity will drop completely, you check the port config on the CCR and notice it somehow now has a speed of 1gbps listed, the only fix at the moment is to reboot the CCR We had to dig a bunch of HP1800's out of retirement to sit between CCR's and Cisco devices and convert from 1gbps/AUTO to 100/FDX just so our client networks stopped dropping out.And there is other more annoing problems, like 100mb/s to Cisco, that make the CCR's freeze etc. (We had 1500 CPE Down today, couse of this problem a technican put wrong GE on a cisco to 100mb/s) (No its not fixed in 6.10, like it supposed to)
Hi Rich,Hey guys.
It's been a while since this was last discussed - what's the current feel on stability of the layer 3 VPN's, particularly with BGP as PE-CE protocol? Good for production?
Rich
Also need LSP ping and TE Auto-tunnel !Now we just need fast reroute
Hi,I have dealt with two people at Mikrotik in regards to MPLS over the past 5 years. In the last 2 years though I have only dealt with one.
Mikrotik support seem to allocate certain engineers to the more specialist areas e.g. wireless and MPLS.
For other areas I have had responses from many different people.
I am all too familiar with this issue.. it is a VPNV4 NLRI update problem.Hi,I have dealt with two people at Mikrotik in regards to MPLS over the past 5 years. In the last 2 years though I have only dealt with one.
Mikrotik support seem to allocate certain engineers to the more specialist areas e.g. wireless and MPLS.
For other areas I have had responses from many different people.
i have the same experience with you til today 6.44, regarding bgp withdraw issue on mpls.
once they promise it will fix on v7, but v7 never show up.
we have mikrotik mpls on our production, but to solved this issue i have to use some cisco PE instead.
problem happen if you use multihoming BGP between CE to 2 CE.
thx
Yes, in Multi homing CE, bgp attribute were ignored when best path withdraw.I am all too familiar with this issue.. it is a VPNV4 NLRI update problem.Hi,I have dealt with two people at Mikrotik in regards to MPLS over the past 5 years. In the last 2 years though I have only dealt with one.
Mikrotik support seem to allocate certain engineers to the more specialist areas e.g. wireless and MPLS.
For other areas I have had responses from many different people.
i have the same experience with you til today 6.44, regarding bgp withdraw issue on mpls.
once they promise it will fix on v7, but v7 never show up.
we have mikrotik mpls on our production, but to solved this issue i have to use some cisco PE instead.
problem happen if you use multihoming BGP between CE to 2 CE.
thx
Mikrotik support confirmed it back in 2014 but there is still no fix.
Yes we are aware of route selection problems in VRFs, unfortunately you will have to wait for ROS v7 updates.
well got this exact answer: will not be fixed in V6, will have to hope for ROSv7 to get mpls and bgp vpnv4 working in a multipath-setup.we asked MT support for help, maybe we can get help - if it says ROS7 I guess we will have to install cisco boxes again...