Advanced Routing Failover without Scripting

Chupaka · Tue Feb 04, 2020 12:34 pm

A comment about PPP uplinks (like PPPoE): viewtopic.php?p=814682#p814682 - important in RouterOS before v7

Introduction

Let us suppose that we have several WAN links, and we want to monitor, whether the Internet is accessible through each of them. The problem can be everywhere.
If your VPN cannot connect - then there's no problem, your default route with gateway=that-vpn-connection will be inactive.
If your ADSL modem is down - then check-gateway=ping is on stage, and no problem again.
But what if your modem is up, and telephone line is down? Or one of your ISP has a problem inside it, so traceroute shows only a few hops - and then stops...
Some people use NetWatch tool to monitor remote locations. Others use scripts to periodically ping remote hosts. And then disable routes or in some other way change the behaviour of routing.
But RouterOS facilities allow us to use only /ip routes to do such checking - no scripting and netwatch at all!

Implementation

Basic Setup

Let's suppose that we have two uplinks: GW1, GW2. It can be addresses of ADSL modems (like 192.168.1.1 and 192.168.2.1), or addresses of PPP interfaces (like pppoe-out1 and pptp-out1). Then, we have some policy routing rules, so all outgoing traffic is marked with ISP1 (which goes to GW1) and ISP2 (which goes to GW2) marks. And we want to monitor Host1 via GW1, and Host2 via GW2 - those may be some popular Internet websites, like Google, Yahoo, etc.
First, create routes to those hosts via corresponding gateways:

/ip route
add dst-address=Host1 gateway=GW1 scope=11
add dst-address=Host2 gateway=GW2 scope=11

Now we create rules for ISP1 routing mark (one for main gateway, and another one for failover):

/ip route
add distance=1 gateway=Host1 target-scope=11 routing-mark=ISP1 check-gateway=ping
add distance=2 gateway=Host2 target-scope=11 routing-mark=ISP1 check-gateway=ping

Those routes will be resolved recursively (see Manual:IP/Route#Nexthop_lookup), and will be active only if HostN is pingable.
Then the same rules for ISP2 mark:

/ip route
add distance=1 gateway=Host2 target-scope=11 routing-mark=ISP2 check-gateway=ping
add distance=2 gateway=Host1 target-scope=11 routing-mark=ISP2 check-gateway=ping

Multiple host checking per Uplink

If Host1 or Host2 in #Basic Setup fails, corresponding link is considered failed too. For redundancy, we may use several hosts per uplink: let's monitor Host1A and Host1B via GW1, and Host2A and Host2B via GW2. Also, we'll use double recursive lookup, so that there were fewer places where HostN is mentioned.
As earlier, first we need routes to our checking hosts:

/ip route
add dst-address=Host1A gateway=GW1 scope=11
add dst-address=Host1B gateway=GW1 scope=11
add dst-address=Host2A gateway=GW2 scope=11
add dst-address=Host2B gateway=GW2 scope=11

Then, let's create destinations to "virtual" hops to use in further routes. I'm using 10.1.1.1 and 10.2.2.2 as an example:

/ip route
add dst-address=10.1.1.1 gateway=Host1A scope=12 target-scope=11 check-gateway=ping
add dst-address=10.1.1.1 gateway=Host1B scope=12 target-scope=11 check-gateway=ping
add dst-address=10.2.2.2 gateway=Host2A scope=12 target-scope=11 check-gateway=ping
add dst-address=10.2.2.2 gateway=Host2B scope=12 target-scope=11 check-gateway=ping

And now we may add default routes for clients:

/ip route
add distance=1 gateway=10.1.1.1 target-scope=12 routing-mark=ISP1
add distance=2 gateway=10.2.2.2 target-scope=12 routing-mark=ISP1
add distance=1 gateway=10.2.2.2 target-scope=12 routing-mark=ISP2
add distance=2 gateway=10.1.1.1 target-scope=12 routing-mark=ISP2

Workaround 1

In ROS versions at least up to 4.10 there's a bug, and if your ethernet interface goes down (for example, your directly connected ADSL modem is powered off) and then brings up, recursive routes are not recalculated (or something) and all traffic still goes via another uplink. As a workaround, additional rules for each HostN may be used. When adding them, all is recalculated correctly:

/ip route
add dst-address=Host1 type=blackhole distance=20
add dst-address=Host2 type=blackhole distance=20

Thanks to

Valens Riyadi, on Poland MUM 2010 he mentioned casually that using of 'scope' attribute is possible for remote host checking for failover implementation
Martín (Ibersystems) - he asked for a solution, and I invented what you see above =)
Robert Urban (treborr) - he faced a problem mentioned in Workaround1, and we both solved it =)

Chupaka · Tue Feb 04, 2020 12:37 pm

As MikroTik decided to completely kill user-contributed Wiki and deleted all non-MikroTik staff accounts, I'm moving the article here to think what's the best place for it and edit it some time later to add info about PPP connections (as recursive routing lookup doesn't work with interface routes in RouterOS).

SiB · Tue Feb 04, 2020 3:56 pm

Good idea to have place to speak fee about wiki.
MikroTik build now new wiki/KB at: https://help.mikrotik.com/docs/
.
My problems with those solution are:

This not work with interface like lte1, we must use IP address as HOSTx. This is purpose to stop using LTE Passthrough mode because DualNAT give possibility to main router to use Recursive Routing.
When users do a speedtest then nexthop detection via icmp (can and I sure that) reach a timeout. Means speedtest-s on LAN break whole ISP Recursive Routing Path.

Chupaka · Tue Feb 04, 2020 4:07 pm

This not work with interface like lte1, we must use IP address as HOSTx. This is purpose to stop using LTE Passthrough mode because DualNAT give possibility to main router to use Recursive Routing.

Can't you just use gateway IP? I'm not familiar with LTE interfaces...

SiB · Tue Feb 04, 2020 4:44 pm

Can't you just use gateway IP? I'm not familiar with LTE interfaces...

.

[marcin.przysowa@SXTR_LTE6] > ip address print detail where interface=lte1
Flags: X - disabled, I - invalid, D - dynamic 
 0 D address=37.109.59.226/32 network=37.109.59.226 interface=lte1 actual-interface=lte1

.
Nope, all combination checked. Route Filter-s not have any additional action to help in this. Gateway in LTE is only interface.
This should be added by MikroTik support into (not work now) wiki page: https://wiki.mikrotik.com/wiki/Advanced ... _Scripting that Recourse Routing works only via IP address.

Chupaka · Tue Feb 04, 2020 6:37 pm

Just wondering whether a crutch like "add any fake address to that interface with network=100.69.69.69/32 and then use that 100.69.69.69 as gateway IP" can work for LTE...

SiB · Wed Feb 05, 2020 11:56 am

Just wondering whether a crutch like "add any fake address to that interface with network=100.69.69.69/32 and then use that 100.69.69.69 as gateway IP" can work for LTE...

This was my first try when I buy my own sxtr.

Before I try help with this case: Load balancing with internal LTE modem (recursive resolution not working) but this is a limitation inside ROS.
All static way's was checked. No work. When we use Dynamic address/route we can modify dynamic route via Route>Filters but still Recouring Routing not work at it.

theboleslaw · Fri Feb 21, 2020 10:33 am

In ROS versions at least up to 4.10 there's a bug, and if your ethernet interface goes down (for example, your directly connected ADSL modem is powered off) and then brings up, recursive routes are not recalculated (or something) and all traffic still goes via another uplink. As a workaround, additional rules for each HostN may be used. When adding them, all is recalculated correctly:
Code: Select all
/ip route
add dst-address=Host1 type=blackhole distance=20
add dst-address=Host2 type=blackhole distance=20
Thanks to

Valens Riyadi, on Poland MUM 2010 he mentioned casually that using of 'scope' attribute is possible for remote host checking for failover implementation

Martín (Ibersystems) - he asked for a solution, and I invented what you see above =)

Robert Urban (treborr) - he faced a problem mentioned in Workaround1, and we both solved it =)

Thanks for this, I also notice this bug when configuring my Policy Base Routing wherein when the 2nd WAN gets down it won't reconnect back and added blackhole to counter it.

anav · Wed Mar 11, 2020 6:07 pm

Q1. What is the plan when the dst-address does NOT equal a static fixed WANIP, but instead is a dynamic WANIP?

SiB · Wed Mar 11, 2020 6:40 pm

anav

Q1. What is the plan when the dst-address does NOT equal a static fixed WANIP, but instead is a dynamic WANIP?

You can use the dhcp-client parameters and inside is the script's who can do some additional works.
Key work can do the routing filter who can change the Dynamic interface parameter's like /routing filter ... set-scope= .

Chupaka · Wed Mar 11, 2020 9:39 pm

Here you can find an example script (sorry for Russian, please use Google Translate, but generally there's only a single variable in the script):
https://forum.mikrotik.by/viewtopic.php?t=323

clueluzz · Fri Mar 13, 2020 2:24 am

@Chupaka, this is great. Would it be correct to assume that if I were to have 3 recursive failovers, it would look like this based on your code above:

/ip route
add dst-address=Host1 gateway=GW1 scope=10
add dst-address=Host2 gateway=GW2 scope=10
add dst-address=Host3 gateway=GW3 scope=10

/ip route
add distance=1 gateway=Host1 routing-mark=ISP1 check-gateway=ping
add distance=2 gateway=Host2 routing-mark=ISP1 check-gateway=ping
add distance=3 gateway=Host3 routing-mark=ISP1 check-gateway=ping

/ip route
add distance=1 gateway=Host2 routing-mark=ISP2 check-gateway=ping
add distance=2 gateway=Host3 routing-mark=ISP2 check-gateway=ping
add distance=3 gateway=Host1 routing-mark=ISP2 check-gateway=ping

/ip route
add distance=1 gateway=Host3 routing-mark=ISP3 check-gateway=ping
add distance=2 gateway=Host2 routing-mark=ISP3 check-gateway=ping
add distance=3 gateway=Host1 routing-mark=ISP3 check-gateway=ping

/ip route
add dst-address=Host1 type=blackhole distance=20
add dst-address=Host2 type=blackhole distance=20
add dst-address=Host3 type=blackhole distance=20

Is that correct? Plus, if I wanted to add load-balancing to this method, would I have to add other routes?

Looking forward to hearing your thoughts

Chupaka · Fri Mar 13, 2020 11:28 am

Correct. Those routes are enough for LB setup, just mark routing on packets accordingly.

clueluzz · Sat Mar 14, 2020 12:21 pm

Thanks @Chupaka

WiruSSS · Wed Mar 18, 2020 10:50 am

Can't you just use gateway IP? I'm not familiar with LTE interfaces...
.
Code: Select all
[marcin.przysowa@SXTR_LTE6] > ip address print detail where interface=lte1
Flags: X - disabled, I - invalid, D - dynamic 
 0 D address=37.109.59.226/32 network=37.109.59.226 interface=lte1 actual-interface=lte1 
.
Nope, all combination checked. Route Filter-s not have any additional action to help in this. Gateway in LTE is only interface.
This should be added by MikroTik support into (not work now) wiki page: https://wiki.mikrotik.com/wiki/Advanced ... _Scripting that Recourse Routing works only via IP address.

In case of using another Mikrotik as your router (SXTR only as a modem with passthrough) it's working fine. Your WAN on the second Mikrotik should get something like this:

2 D address=xxx.xxx.xxx.xxx/30 network=yyy.yyy.yyy.yyy interface=ether9 actual-interface=ether9

so you should be able to use the network address as a gateway.
It's working fine in my configuration

SiB · Wed Mar 18, 2020 1:38 pm

WiruSSS

In case of using another Mikrotik as your router (SXTR only as a modem with passthrough) it's working fine.

But I write about other config, without passthrough, directly at RB who have lte1 interface who get IP from ISP you cannot use RecursiveRouting at dynamic interface. This is a case.
Your config it's works but it's other network scenario.

WiruSSS · Wed Mar 18, 2020 3:26 pm

Yes yes, i know it. I've just written this as a workaround.

KOK · Fri Apr 24, 2020 9:13 am

Thank you for moving this here.

I have a couple of questions, hope you guys could help me out.

1.- Do I have to routing mark the packets in Firewall Mangle so the...

/ip route
add distance=1 gateway=10.1.1.1 routing-mark=ISP1
add distance=2 gateway=10.2.2.2 routing-mark=ISP1
add distance=1 gateway=10.2.2.2 routing-mark=ISP2
add distance=2 gateway=10.1.1.1 routing-mark=ISP2

... routing-mark=ISP1 and routing-mark=ISP2 works? if so, how would I do that? Or the routing-mark=ISP1 and routing-mark=ISP2 is just for having 2 Routing Tables?

2.- Are this still necessary with newer ROS versions? There is a mention of a Ver-4.10 bug, but someone said it helped to reconnect back the WAN and I assume that is in a recent version.

"add dst-address=Host1 type=blackhole distance=20"

3.- What is "better", this recursive routes failover or a script based one, and why?

Any help would be appreciated,
Thanks in advance!
Regards,
SN

Chupaka · Fri Apr 24, 2020 8:09 pm

1.- Do I have to routing mark the packets in Firewall Mangle so the...
Code: Select all
/ip route
add distance=1 gateway=10.1.1.1 routing-mark=ISP1
add distance=2 gateway=10.2.2.2 routing-mark=ISP1
add distance=1 gateway=10.2.2.2 routing-mark=ISP2
add distance=2 gateway=10.1.1.1 routing-mark=ISP2
... routing-mark=ISP1 and routing-mark=ISP2 works? if so, how would I do that? Or the routing-mark=ISP1 and routing-mark=ISP2 is just for having 2 Routing Tables?

Well, if you create routing tables (by setting routing-mark on routes), you need to send traffic to them: either by marking in Firewall Mangle or by IP -> Route -> Rules.

2.- Are this still necessary with newer ROS versions? There is a mention of a Ver-4.10 bug, but someone said it helped to reconnect back the WAN and I assume that is in a recent version.
Code: Select all
"add dst-address=Host1 type=blackhole distance=20"

It won't break anything, and it prevents traffic to Host1 going via another route. I don't remember the details of that bug, but I prefer to keep this rule in place

3.- What is "better", this recursive routes failover or a script based one, and why?

What is better: a car or a bike? I think, it depends on your task

Recursive routes are "automagic", but there are limits (like you cannot use it with interface routes, including "gateway=1.2.3.4%ether1" in case you have the same gateway IP/subnet on two uplinks.

Zacharias · Fri Apr 24, 2020 8:48 pm

recursive routes are not recalculated (or something) and all traffic still goes via another uplink

About 2 months ago that i made a lab for recursive routes and failover, as far as i remember the recursive routes were recalculated... version was 6.4x.y something...

KOK · Fri Apr 24, 2020 9:48 pm

Thank you for your answers!
I stll have some doubts:

Well, if you create routing tables (by setting routing-mark on routes), you need to send traffic to them: either by marking in Firewall Mangle or by IP -> Route -> Rules.

I don't have too much experience with marking Packets, I understand the concept a little bit tho.. some time ago I used it to some queues.
Can a packet have 2 routing marks? If not, I don't get it then, what would be the point of marking packets with routing mark? wouldn't that be something like a Load Balance to split the traffic between the two WANs? Or, How should I manage it so all traffic goes to one route or the other when the check gateway fails?

It won't break anything, and it prevents traffic to Host1 going via another route. I don't remember the details of that bug, but I prefer to keep this rule in place

Thanks, I'll include that!

What is better: a car or a bike? I think, it depends on your task Recursive routes are "automagic", but there are limits (like you cannot use it with interface routes, including "gateway=1.2.3.4%ether1" in case you have the same gateway IP/subnet on two uplinks.

Roger That!! I think "automagic" recursive routes will do the job, and also I did used it some years ago to failover between links to a set of servers, not the whole internet. In that time, I didn't used routing marks tho.

Another comment:
I remember having some troubles back then with the check-gateway=ping, and if a remember correctly, what I did was that I set the pref. source field with the specific IP address where the check ping should be generated.
Just out of curiosity, and based on your comment.. I don't know if that its an alternative to

"add dst-address=Host1 type=blackhole distance=20"

Thank you again for your help!
SN

Chupaka · Sat Apr 25, 2020 12:03 am

I don't have too much experience with marking Packets, I understand the concept a little bit tho.. some time ago I used it to some queues.
Can a packet have 2 routing marks? If not, I don't get it then, what would be the point of marking packets with routing mark? wouldn't that be something like a Load Balance to split the traffic between the two WANs? Or, How should I manage it so all traffic goes to one route or the other when the check gateway fails?

Again, please define your goal first. If you need failover - then just don't use routing marks. For load balancing, you mark some packets with ISP1 mark and others with ISP2 mark. After that they go to the necessary uplinks.

I remember having some troubles back then with the check-gateway=ping, and if a remember correctly, what I did was that I set the pref. source field with the specific IP address where the check ping should be generated.

That can be necessary if your connected route for the gateway has different pref. src for some reason.

dave864 · Tue Jul 14, 2020 9:27 pm

I have tried this method of load balancing with fail over.
While I am able to successfully load balance; WAN1 without any routing marks but WAN2 with routing mark to_WAN2
Using Address lists and Mangle I now have most traffic on WAN1 but 2 devices on WAN2.

When WAN1 or WAN2 are power cycled, the recursive message changes to the alternative WAN IP address but the corresponding rule does not activate. What I mean in the 4 route, with 2 active (the default route) only those route change their recursive nature. The backup routes NEVER become active.

Also, traffic does not appear to swap WAN.

WAN1 = 192.168.10.1 modem ether7
WAN2 = 192.168.15.1 modem ether6

initially WAN1 is recursive on 192.168.10.1 When power cycled, this then changes to 192.168.15.1 but traffic does not flow.

Any ideas?

SiB · Wed Jul 15, 2020 12:43 am

dave864 :

Any ideas?

Sorry, the glass ball broke, this post exist to archive the documentation page from wiki.
it's work properly in many my MultiWan situation. About LB...

I really recommended to learn and do this one HowTo who have got the best way to use many WAN's at ones.
Next you can use any method netwatch/script/pcc etc to just flow the outgoing - then this is small stuff.

Bandwidth-based load-balancing with failover. This presentation also covers Mangle.
This was presented at the MUM (MikroTik User Meeting) in New Orelans, USA.
Tomas Kirnak - YouTube: https://www.youtube.com/watch?v=67Dna_ffCvc&t=1s
http://mum.mikrotik.com/presentations/US12/tomas.pdf

And Recursive Routing is a good way to automate wan detection.

dave864 · Wed Jul 15, 2020 10:22 pm

WAN2 have a connection mark.
WAN1 does not. Could that be the source of the problem you think?

Chupaka · Wed Jul 15, 2020 10:26 pm

"/ip route print detail" can shed some light on what's happening, after that we can explain if something goes wrong or happens as expected

dave864 · Mon Jul 20, 2020 10:47 pm

I had changed WAN1 to now be fully Conn marked. So now both WAN1 & WAN2 devices have conn marks. I obviously have the Route marks set in Mangle too.
Today I had an outage on WAN1. I turned WAN1 off and all the WAN1 devices did not switch over. The route did change to the backup. However, a Dynamic rule was created. I deleted the dynamic rule and still no connectivity on WAN1 devices. It wasn't vital and the outage lasted only an hour, so I left it.

Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 0 X S  ;;; Local LTE
        dst-address=0.0.0.0/0 gateway=192.168.42.129 
        gateway-status=192.168.42.129 inactive check-gateway=ping distance=2 
        scope=30 target-scope=10 routing-mark=to_ISP2 

 1 A S  ;;; DEFAULT route for WAN2 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 
        gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 
        check-gateway=ping distance=1 scope=10 target-scope=10 
        routing-mark=to_WAN2 

 2   S  ;;; backup route for WAN2 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 
        gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 
        check-gateway=ping distance=2 scope=10 target-scope=10 
        routing-mark=to_WAN2 

 3 X S  ;;; WAN2 Default
        dst-address=0.0.0.0/0 gateway=192.168.15.1 
        gateway-status=192.168.15.1 inactive check-gateway=ping distance=1 
        scope=10 target-scope=30 routing-mark=to_WAN2 

 4 X S  ;;; WAN2 backup
        dst-address=0.0.0.0/0 gateway=192.168.10.1 
        gateway-status=192.168.10.1 inactive check-gateway=ping distance=2 
        scope=10 target-scope=30 routing-mark=to_WAN2 

 5 A S  ;;; DEFAULT route for WAN1 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 
        gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 
        check-gateway=ping distance=1 scope=10 target-scope=10 
        routing-mark=to_WAN1 

 6   S  ;;; backup route for WAN1 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 
        gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 
        check-gateway=ping distance=2 scope=10 target-scope=10 
        routing-mark=to_WAN1 

 7 ADS  dst-address=0.0.0.0/0 gateway=192.168.15.1 
        gateway-status=192.168.15.1 reachable via  ether6 distance=1 scope=30

dave864 · Mon Jul 20, 2020 10:52 pm

I removed the DAS dynamic entry - again. happens whenever a connection drops.

2020-07-20 (2).png

Now I get this:

 0 X S  ;;; Local LTE
        dst-address=0.0.0.0/0 gateway=192.168.42.129 gateway-status=192.168.42.129 inactive check-gateway=ping distance=2 
        scope=30 target-scope=10 routing-mark=to_ISP2 

 1 A S  ;;; DEFAULT route for WAN2 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 check-gateway=ping 
        distance=1 scope=10 target-scope=10 routing-mark=to_WAN2 

 2   S  ;;; backup route for WAN2 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 check-gateway=ping 
        distance=2 scope=10 target-scope=10 routing-mark=to_WAN2 

 3 X S  ;;; WAN2 Default
        dst-address=0.0.0.0/0 gateway=192.168.15.1 gateway-status=192.168.15.1 inactive check-gateway=ping distance=1 
        scope=10 target-scope=30 routing-mark=to_WAN2 

 4 X S  ;;; WAN2 backup
        dst-address=0.0.0.0/0 gateway=192.168.10.1 gateway-status=192.168.10.1 inactive check-gateway=ping distance=2 
        scope=10 target-scope=30 routing-mark=to_WAN2 

 5 A S  ;;; DEFAULT route for WAN1 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 check-gateway=ping 
        distance=1 scope=10 target-scope=10 routing-mark=to_WAN1 

 6   S  ;;; backup route for WAN1 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 check-gateway=ping 
        distance=2 scope=10 target-scope=10 routing-mark=to_WAN1 

 7 X S  ;;; WAN1 backup
        dst-address=0.0.0.0/0 gateway=192.168.15.1 gateway-status=192.168.15.1 inactive check-gateway=ping distance=2 
        scope=10 target-scope=30 

 8 X S  ;;; WAN1 default
        dst-address=0.0.0.0/0 gateway=192.168.10.1 gateway-status=192.168.10.1 inactive check-gateway=ping distance=1 
        scope=10 target-scope=30 

 9 A S  ;;; Ping target 2 on WAN2
        dst-address=8.8.4.4/32 gateway=192.168.15.1 gateway-status=192.168.15.1 reachable via  ether6 distance=1 scope=10 
        target-scope=10 

10 X SB ;;; Blackhole Ping target2 fix
        dst-address=8.8.4.4/32 type=blackhole distance=20

Chupaka · Wed Jul 22, 2020 12:47 am

So, what routing mark are we discussing? I don't know your marking rules.

dave864 · Wed Jul 22, 2020 11:08 am

to_WAN1 and to_WAN2

So I have removed the old testing rules. So everything listed is used except the LTE rule 0 and the currently the blackholes are not active.

 0 X S  ;;; Local LTE
        dst-address=0.0.0.0/0 gateway=192.168.42.129 gateway-status=192.168.42.129 inactive check-gateway=
        scope=30 target-scope=10 routing-mark=to_ISP2 

 1 A S  ;;; DEFAULT route for WAN2 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 che
        distance=1 scope=10 target-scope=10 routing-mark=to_WAN2 

 2   S  ;;; backup route for WAN2 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 che
        distance=2 scope=10 target-scope=10 routing-mark=to_WAN2 

 3 A S  ;;; DEFAULT route for WAN1 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 che
        distance=1 scope=10 target-scope=10 routing-mark=to_WAN1 

 4   S  ;;; backup route for WAN1 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 che
        distance=2 scope=10 target-scope=10 routing-mark=to_WAN1 

 5 A S  ;;; Ping target 2 on WAN2
        dst-address=8.8.4.4/32 gateway=192.168.15.1 gateway-status=192.168.15.1 reachable via  ether6 dist
        target-scope=10 

 6 X SB ;;; Blackhole Ping target2 fix
        dst-address=8.8.4.4/32 type=blackhole distance=20 

 7 A S  ;;; Ping target 1 on WAN1
        dst-address=8.8.8.8/32 gateway=192.168.10.1 gateway-status=192.168.10.1 reachable via  ether7 dist
        target-scope=10 

 8 X SB ;;; Blackhole Ping target1 fix
        dst-address=8.8.8.8/32 type=blackhole distance=20 

 9 ADC  dst-address=192.168.10.0/24 pref-src=192.168.10.10 gateway=ether7 gateway-status=ether7 reachable 

10 ADC  dst-address=192.168.15.0/24 pref-src=192.168.15.254 gateway=ether6 gateway-status=ether6 reachable
        scope=10 

11  DC  dst-address=192.168.40.0/24 pref-src=192.168.40.1 gateway=sfp-sfpplus1 gateway-status=sfp-sfpplus1
        distance=255 scope=10

Chupaka · Wed Jul 22, 2020 12:42 pm

In that state to_WAN1 traffic goes to 192.168.10.1 ether7, to_WAN2 traffic goes to 192.168.15.1 ether6 - is that what you expect?

dave864 · Wed Jul 22, 2020 2:35 pm

Yes, that is correct.
For to_WAN1, When the modern on ether7 goes down then I expect it to switch to ether6. While that does happen in the router, additional dynamic rule is created. And the traffic does not actually flow to ether6. When I delete the dynamic rule traffic still does not flow.

By dynamic, I mean an automatically generated rule. Those are represented as D status rules.

Chupaka · Wed Jul 22, 2020 3:55 pm

What rule? What's /ip route print detail at that moment?

dave864 · Wed Jul 22, 2020 5:06 pm

Normal: WAN1 and WAN2 working

 0 X S  ;;; Local LTE
        dst-address=0.0.0.0/0 gateway=192.168.42.129 gateway-status=192.168.42.129 inactive check-gateway=ping distance=2 scope=30 target-scope=10 
        routing-mark=to_ISP2 

 1 A S  ;;; DEFAULT route for WAN2 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 check-gateway=ping distance=1 scope=10 
        target-scope=10 routing-mark=to_WAN2 

 2   S  ;;; backup route for WAN2 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 check-gateway=ping distance=2 scope=10 
        target-scope=10 routing-mark=to_WAN2 

 3 A S  ;;; DEFAULT route for WAN1 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 check-gateway=ping distance=1 scope=10 
        target-scope=10 routing-mark=to_WAN1 

 4   S  ;;; backup route for WAN1 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 check-gateway=ping distance=2 scope=10 
        target-scope=10 routing-mark=to_WAN1 

 5 A S  ;;; Ping target 2 on WAN2
        dst-address=8.8.4.4/32 gateway=192.168.15.1 gateway-status=192.168.15.1 reachable via  ether6 distance=1 scope=10 target-scope=10 

 6 X SB ;;; Blackhole Ping target2 fix
        dst-address=8.8.4.4/32 type=blackhole distance=20 

 7 A S  ;;; Ping target 1 on WAN1
        dst-address=8.8.8.8/32 gateway=192.168.10.1 gateway-status=192.168.10.1 reachable via  ether7 distance=1 scope=10 target-scope=10 

 8 X SB ;;; Blackhole Ping target1 fix
        dst-address=8.8.8.8/32 type=blackhole distance=20 

 9 ADC  dst-address=192.168.10.0/24 pref-src=192.168.10.10 gateway=ether7 gateway-status=ether7 reachable distance=0 scope=10 

10 ADC  dst-address=192.168.15.0/24 pref-src=192.168.15.254 gateway=ether6 gateway-status=ether6 reachable distance=0 scope=10 

11  DC  dst-address=192.168.40.0/24 pref-src=192.168.40.1 gateway=sfp-sfpplus1 gateway-status=sfp-sfpplus1 unreachable distance=255 scope=10 

12 ADC  dst-address=192.168.50.0/24 pref-src=192.168.50.1 gateway=bridge1 gateway-status=bridge1 reachable distance=0 scope=10 

13  DC  dst-address=192.168.51.0/24 pref-src=192.168.51.1 gateway=ether5 gateway-status=ether5 unreachable distance=255 scope=10

Then WAN1 dead, note that the routes switch from DEFAULT to Backup for WAN1 but no to_WAN1 data flows through WAN2:

 0 X S  ;;; Local LTE
        dst-address=0.0.0.0/0 gateway=192.168.42.129 gateway-status=192.168.42.129 inactive check-gateway=ping distance=2 scope=30 target-scope=10 
        routing-mark=to_ISP2 

 1 A S  ;;; DEFAULT route for WAN2 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 check-gateway=ping distance=1 scope=10 
        target-scope=10 routing-mark=to_WAN2 

 2   S  ;;; backup route for WAN2 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 unreachable check-gateway=ping distance=2 scope=10 target-scope=10 
        routing-mark=to_WAN2 

 3   S  ;;; DEFAULT route for WAN1 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 unreachable check-gateway=ping distance=1 scope=10 target-scope=10 
        routing-mark=to_WAN1 

 4 A S  ;;; backup route for WAN1 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 check-gateway=ping distance=2 scope=10 
        target-scope=10 routing-mark=to_WAN1 

 5 A S  ;;; Ping target 2 on WAN2
        dst-address=8.8.4.4/32 gateway=192.168.15.1 gateway-status=192.168.15.1 reachable via  ether6 distance=1 scope=10 target-scope=10 

 6 X SB ;;; Blackhole Ping target2 fix
        dst-address=8.8.4.4/32 type=blackhole distance=20 

 7   S  ;;; Ping target 1 on WAN1
        dst-address=8.8.8.8/32 gateway=192.168.10.1 gateway-status=192.168.10.1 unreachable distance=1 scope=10 target-scope=10 

 8 X SB ;;; Blackhole Ping target1 fix
        dst-address=8.8.8.8/32 type=blackhole distance=20 

 9 ADC  dst-address=192.168.15.0/24 pref-src=192.168.15.254 gateway=ether6 gateway-status=ether6 reachable distance=0 scope=10 

10  DC  dst-address=192.168.40.0/24 pref-src=192.168.40.1 gateway=sfp-sfpplus1 gateway-status=sfp-sfpplus1 unreachable distance=255 scope=10 

11 ADC  dst-address=192.168.50.0/24 pref-src=192.168.50.1 gateway=bridge1 gateway-status=bridge1 reachable distance=0 scope=10 

12  DC  dst-address=192.168.51.0/24 pref-src=192.168.51.1 gateway=ether5 gateway-status=ether5 unreachable distance=255 scope=10 

13  DC  dst-address=192.168.80.0/24 pref-src=192.168.80.1 gateway=ether8 gateway-status=ether8 unreachable distance=255 scope=10

Now WAN1 back online, note the automatic rule (rule number 5):

 0 X S  ;;; Local LTE
        dst-address=0.0.0.0/0 gateway=192.168.42.129 gateway-status=192.168.42.129 inactive check-gateway=ping distance=2 scope=30 target-scope=10 
        routing-mark=to_ISP2 

 1 A S  ;;; DEFAULT route for WAN2 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 check-gateway=ping distance=1 scope=10 
        target-scope=10 routing-mark=to_WAN2 

 2   S  ;;; backup route for WAN2 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 check-gateway=ping distance=2 scope=10 
        target-scope=10 routing-mark=to_WAN2 

 3 A S  ;;; DEFAULT route for WAN1 devices to WAN1
        dst-address=0.0.0.0/0 gateway=8.8.8.8 gateway-status=8.8.8.8 recursive via 192.168.10.1 ether7 check-gateway=ping distance=1 scope=10 
        target-scope=10 routing-mark=to_WAN1 

 4   S  ;;; backup route for WAN1 devices to WAN2
        dst-address=0.0.0.0/0 gateway=8.8.4.4 gateway-status=8.8.4.4 recursive via 192.168.15.1 ether6 check-gateway=ping distance=2 scope=10 
        target-scope=10 routing-mark=to_WAN1 

 5 ADS  dst-address=0.0.0.0/0 gateway=192.168.10.1 gateway-status=192.168.10.1 reachable via  ether7 distance=1 scope=30 target-scope=10 
        vrf-interface=ether7 

 6 A S  ;;; Ping target 2 on WAN2
        dst-address=8.8.4.4/32 gateway=192.168.15.1 gateway-status=192.168.15.1 reachable via  ether6 distance=1 scope=10 target-scope=10 

 7 X SB ;;; Blackhole Ping target2 fix
        dst-address=8.8.4.4/32 type=blackhole distance=20 

 8 A S  ;;; Ping target 1 on WAN1
        dst-address=8.8.8.8/32 gateway=192.168.10.1 gateway-status=192.168.10.1 reachable via  ether7 distance=1 scope=10 target-scope=10 

 9 X SB ;;; Blackhole Ping target1 fix
        dst-address=8.8.8.8/32 type=blackhole distance=20 

10 ADC  dst-address=192.168.10.0/24 pref-src=192.168.10.10 gateway=ether7 gateway-status=ether7 reachable distance=0 scope=10 

11 ADC  dst-address=192.168.15.0/24 pref-src=192.168.15.254 gateway=ether6 gateway-status=ether6 reachable distance=0 scope=10 

12  DC  dst-address=192.168.40.0/24 pref-src=192.168.40.1 gateway=sfp-sfpplus1 gateway-status=sfp-sfpplus1 unreachable distance=255 scope=10 

13 ADC  dst-address=192.168.50.0/24 pref-src=192.168.50.1 gateway=bridge1 gateway-status=bridge1 reachable distance=0 scope=10

Chupaka · Wed Jul 22, 2020 5:45 pm

Do you use VRF there?..

> no to_WAN1 data flows through WAN2
What error does, for example, 'ping' return on the client? Is it timeout? Did you check where actually packets marked as to_WAN1 go?

dave864 · Wed Jul 22, 2020 6:13 pm

Do you use VRF there?..

> no to_WAN1 data flows through WAN2
What error does, for example, 'ping' return on the client? Is it timeout? Did you check where actually packets marked as to_WAN1 go?

No idea what VRF is. I do not use BGP or anything. This router is in my house, I plugged 2 mobile broadband devices into it using Ethernet (1x Chateau, 1x Huawei). I no longer have a fixed line broadband.

All WAN1 traffic is conn tracked to WAN1conn and all WAN2 traffic is conn tracked to WAN2conn. All DNS goes to the respective conn track.

All devices except 3-4 are on WAN1. This is set using an address list. Mangle does this, grabbing all addresses on 192.168.50.x (my LAN) except address list marked to_WAN2list. Therefore, all devices ip not on to_WAN2list are ip listed as to_WAN1list. This works and I can see all that happening.

I have src nat on Ether7 and Ether6 with no route marking.

using various "what is my ip" websites, I have confirmed that devices are exposed to the correct mobile address.

I am assuming that the scope and target scope are correct and that the chaining of the rules (DEFAULT to Ping rule) is correct as that mirrors your original post. The Dynamic rule that is created when a WAN comes back on does not appear to have any negative effects.

Chupaka · Wed Jul 22, 2020 6:21 pm

No idea what VRF is

"vrf-interface=ether7" in your dynamic rule is suspicious. Check your config for unexpected commands...

dave864 · Wed Jul 22, 2020 6:34 pm

2020-07-22v2.png

I think I know the problem:
Mangle.

My Ether7 and Ether6 inputs are mangled to WAN1conn and WAN2conn.
So when my traffic on WAN1 swaps to WAN2, the incoming traffic gets conn marked as WAN2conn while its out going traffic remains at a WAN1conn mark. Do you agree, is this the problem?

Note · Thu Jul 23, 2020 12:48 pm

Hi chupaka and ty for all ur info. As i understand all i need is to set the 2 blackhole route lines. Is that ok u think?

#      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 A S               0.0.0.0/0                                            1.0.0.2                   1
 1 A S               0.0.0.0/0                                            1.1.1.2                   1
 2 A S               0.0.0.0/0                                            1.1.1.2                   1
 3   S                 0.0.0.0/0                                           1.0.0.2                   1
 4 A S               1.0.0.2/32                                      192.168.2.1               1
 5   SB              1.0.0.2/32                                                                       20
 6 A S               1.1.1.2/32                                      192.168.0.1               1
 7   SB              1.1.1.2/32                                                                       20
 8 ADC          10.10.10.0/24        10.10.10.1        Bridge_Xoleritsa          0
 9 ADC          10.10.20.0/24        10.10.20.1         Bridge_Guest              0
10 ADC         10.157.138.0/24    10.157.138.1           Bridge                    0
11 ADC         192.168.0.0/24      192.168.0.2            WAN1                      0
12 ADC         192.168.1.0/24      192.168.1.1              Bridge                    0
13 ADC         192.168.2.0/24      192.168.2.3            WAN2                      0

0 A S  dst-address=0.0.0.0/0 gateway=1.0.0.2 gateway-status=1.0.0.2 recursive via 192.168.2.1 WAN>
        check-gateway=ping distance=1 scope=30 target-scope=10 routing-mark=to_WAN2 

 1 A S  dst-address=0.0.0.0/0 gateway=1.1.1.2 gateway-status=1.1.1.2 recursive via 192.168.0.1 WAN>
        check-gateway=ping distance=1 scope=30 target-scope=10 routing-mark=to_WAN1 

 2 A S  dst-address=0.0.0.0/0 gateway=1.1.1.2 gateway-status=1.1.1.2 recursive via 192.168.0.1 WAN>
        check-gateway=ping distance=1 scope=30 target-scope=10 

 3   S  dst-address=0.0.0.0/0 gateway=1.0.0.2 gateway-status=1.0.0.2 recursive via 192.168.2.1 WAN>
        check-gateway=ping distance=1 scope=30 target-scope=10 

 4 A S  dst-address=1.0.0.2/32 gateway=192.168.2.1 gateway-status=192.168.2.1 reachable via  WAN2 
        check-gateway=ping distance=1 scope=10 target-scope=10 

 5   SB dst-address=1.0.0.2/32 type=blackhole distance=20 

 6 A S  dst-address=1.1.1.2/32 gateway=192.168.0.1 gateway-status=192.168.0.1 reachable via  WAN1 
        check-gateway=ping distance=1 scope=10 target-scope=10

,

And something else..... is it better for performance to set a lower blackhole distance? Lets say 5....

Chupaka · Fri Jul 24, 2020 1:38 pm

My Ether7 and Ether6 inputs are mangled to WAN1conn and WAN2conn.
So when my traffic on WAN1 swaps to WAN2, the incoming traffic gets conn marked as WAN2conn while its out going traffic remains at a WAN1conn mark. Do you agree, is this the problem?

I don't see your exact rules. The rules from the manual mark only externally-initiated connections on WAN interfaces. So it should not affect your connections from LAN. You may add a logging rule to "forward" chain to see where some packets go ("out" interface in Log) when WAN1 is unavailable.

is it better for performance to set a lower blackhole distance

No difference

Note · Fri Jul 24, 2020 2:54 pm

here is my exact failover script that i use and so on as i tested it works well........ any comments? i use distance 1 in both wan cause i mark ports in magle and i do routing mark. LB in torrents and failover seems that working fine.

/ip route
add check-gateway=ping distance=1 gateway=1.0.0.2 routing-mark=to_WAN2
add check-gateway=ping distance=1 gateway=1.1.1.2 routing-mark=to_WAN1
add check-gateway=ping distance=1 gateway=1.1.1.2
add check-gateway=ping distance=1 gateway=1.0.0.2
add check-gateway=ping distance=1 dst-address=1.0.0.2/32 gateway=192.168.2.1 scope=10
add distance=5 dst-address=1.0.0.2/32 type=blackhole
add check-gateway=ping distance=1 dst-address=1.1.1.2/32 gateway=192.168.0.1 scope=10
add distance=5 dst-address=1.1.1.2/32 type=blackhole

Chupaka · Fri Jul 24, 2020 4:54 pm

Looks like there's no mangle rules in your script...

Note · Fri Jul 24, 2020 6:39 pm

I just posted my routing rules, here r my magle rules............

/ip firewall mangle
add action=mark-routing chain=prerouting comment=______Guest_to_WAN2 \
    new-routing-mark=to_WAN2 passthrough=no src-address=10.10.20.0/24
add action=mark-routing chain=prerouting comment=_______Remotes&Games_to_WAN1 \
    new-routing-mark=to_WAN1 passthrough=no port=1320,17771,5000-5500,7985 \
    protocol=udp src-address=10.157.138.0/24
add action=mark-routing chain=prerouting dst-port="" new-routing-mark=to_WAN1 \
    passthrough=no port=1320,12975,32976,4899,5938,48377 protocol=tcp \
    src-address=10.157.138.0/24
add action=mark-routing chain=prerouting comment=_______ZLO_to_WAN1 \
    dst-address-list=Zlo_Games new-routing-mark=to_WAN1 passthrough=no \
    src-address=10.157.138.100/31
add action=mark-routing chain=prerouting comment=\
    "_______Torrents_to_WAN1 or WAN2" disabled=yes new-routing-mark=to_WAN2 \
    passthrough=no port=8999-65535 protocol=tcp src-address=10.157.138.100/31
add action=mark-routing chain=prerouting disabled=yes new-routing-mark=\
    to_WAN2 passthrough=no port=8999-65535 protocol=udp src-address=\
    10.157.138.100/31
add action=mark-connection chain=input comment=\
    _______Load_Balance_Mark_IN-OUT in-interface=WAN1 new-connection-mark=\
    WAN1_conn passthrough=no
add action=mark-connection chain=input in-interface=WAN2 new-connection-mark=\
    WAN2_conn passthrough=no
add action=mark-routing chain=output connection-mark=WAN1_conn \
    new-routing-mark=to_WAN1 passthrough=no
add action=mark-routing chain=output connection-mark=WAN2_conn \
    new-routing-mark=to_WAN2 passthrough=no
add action=accept chain=prerouting comment=\
    _______Load_Balance_Accept_All_WANS dst-address=192.168.0.0/24 \
    in-interface=Bridge
add action=accept chain=prerouting dst-address=192.168.2.0/24 in-interface=\
    Bridge
add action=mark-routing chain=prerouting comment="_______HTTP-S_Routing mark" \
    new-routing-mark=to_WAN1 passthrough=no port=80,443 protocol=tcp \
    src-address=10.157.138.0/24
add action=mark-routing chain=prerouting new-routing-mark=to_WAN1 \
    passthrough=no port=80,443 protocol=udp src-address=10.157.138.0/24
add action=mark-connection chain=prerouting comment=\
    "_______Load_Balance_Divider&Routing mark" dst-address-type=!local \
    in-interface=Bridge new-connection-mark=WAN1_conn passthrough=yes \
    per-connection-classifier=both-addresses-and-ports:2/0
add action=mark-connection chain=prerouting dst-address-type=!local \
    in-interface=Bridge new-connection-mark=WAN2_conn passthrough=yes \
    per-connection-classifier=both-addresses-and-ports:2/1
add action=mark-routing chain=prerouting connection-mark=WAN1_conn \
    in-interface=Bridge new-routing-mark=to_WAN1 passthrough=no
add action=mark-routing chain=prerouting connection-mark=WAN2_conn \
    in-interface=Bridge new-routing-mark=to_WAN2 passthrough=no
add action=change-dscp chain=prerouting comment=_______DSCP_63_ICMP new-dscp=\
    63 passthrough=no protocol=icmp
add action=change-dscp chain=postrouting comment=_______DSCP_63_ICMP \
    new-dscp=63 passthrough=no protocol=icmp
add action=change-dscp chain=prerouting comment=\
    _______DSCP_63_DNS-REMOTES-GAMES new-dscp=63 passthrough=no port=\
    53,1320,17771,5000-5500,48377 protocol=udp
add action=change-dscp chain=postrouting comment=\
    _______DSCP_63_DNS-REMOTES-GAMES new-dscp=63 passthrough=no port=\
    53,1320,17771,5000-5500,48377 protocol=udp
add action=change-dscp chain=prerouting comment=\
    _______DSCP_56_HTTP-S_SMALL-REMOTES connection-bytes=0-500000 new-dscp=56 \
    passthrough=no port=80,443,8080,1320,12975,32976,4899,5938,7062 protocol=\
    tcp
add action=change-dscp chain=postrouting comment=\
    _______DSCP_56_HTTP-S_SMALL-REMOTES connection-bytes=0-500000 new-dscp=56 \
    passthrough=no port=80,443,8080,1320,12975,32976,4899,5938,7062 protocol=\
    tcp
add action=change-dscp chain=prerouting comment=_______DSCP_24_HTTP_S_LARGE \
    new-dscp=24 passthrough=no port=80,443,8080 protocol=tcp
add action=change-dscp chain=postrouting comment=_______DSCP_24_HTTP_S_LARGE \
    new-dscp=24 passthrough=no port=80,443,8080 protocol=tcp
add action=change-dscp chain=prerouting comment=_______DSCP_0_Torrents \
    new-dscp=0 passthrough=no port=8999-65355 protocol=tcp
add action=change-dscp chain=postrouting comment=_______DSCP_0_Torrents \
    new-dscp=0 passthrough=no port=8999-65355 protocol=tcp
add action=change-dscp chain=prerouting comment=_______DSCP_0_Torrents \
    new-dscp=0 passthrough=no port=8999-65355 protocol=udp
add action=change-dscp chain=postrouting comment=_______DSCP_0_Torrents \
    new-dscp=0 passthrough=no port=8999-65355 protocol=udp

dave864 · Fri Jul 24, 2020 9:45 pm

Hi Note,
If you have a rule that marks a connection, and then a rule to mark a route then you must have passthrough = YES on the mark connection. That way, the processing can drop onto the route mark rule.

Note · Sat Jul 25, 2020 11:48 am

Hi Dave and thnx for ur participation,

I do not know what exactly u mean by that, but to have that working well i had to put the ports that i mark in the beginning and the LB rules after and only the dividers with passthrough yes. Otherwise i had issues.

Chupaka · Sat Jul 25, 2020 2:00 pm

Just a note: you don't need to mark connections in your setup, as you mark connection for every packet from LAN, and then mark routing for every packet from LAN using connection-mark you just set. You can mark routing directly. Unless you're using those marks in Filter or NAT for some reason...

Anyway, I'd like to see basic diagnostics when WAN1 is unavailable. Traceroute, for example. Because generally everything looks good.

Note · Sat Jul 25, 2020 7:15 pm

At ping 8.8.8.8 -t, i dont even have one request timeout when i disable first wan1 then enable and then disable wan2. The echo reply is consecutive.

Chupaka · Mon Jul 27, 2020 2:58 pm

You mean, now everything works as expected?..

Note · Wed Jul 29, 2020 2:09 pm

Exactly..... i have also set blackhole distance=3

DarkNate · Sun Aug 16, 2020 3:18 am

At ping 8.8.8.8 -t, i dont even have one request timeout when i disable first wan1 then enable and then disable wan2. The echo reply is consecutive.

I know what you're talking about. 1 packet loss every time. Just 1 literally. And it happens on LAN traffic as well.

I've narrowed down the problem to PCC load balancing itself, I've reduced it by using a destination address list for LAN traffic to exclude it from marking. But the 1 packet loss still occurs. It has nothing to do with the recursive routes.

Maybe someone else knows why this happens.

In my opinion it looks like a RouterOS bug.

dave864 · Mon Aug 24, 2020 11:12 am

Just a note: you don't need to mark connections in your setup, as you mark connection for every packet from LAN, and then mark routing for every packet from LAN using connection-mark you just set. You can mark routing directly. Unless you're using those marks in Filter or NAT for some reason...

Anyway, I'd like to see basic diagnostics when WAN1 is unavailable. Traceroute, for example. Because generally everything looks good.

I am marking connections because I cannot get any traffic to flow without conn marks.
What kind of mangle do I need here?

Mark routing on Prerouting?
Mark routing on Output?

I am convinced the reason this doesn't work is the mangle.
My current config has the connections in Route, switching over when a link dies, so that is good. But no traffic flows.

I am using SRCNAT. Is that correct? should I be using Masquerade?

 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 A S  ;;; DEFAULT route for WAN1 devices to WAN1
        0.0.0.0/0                          8.8.8.8                   1
 1   S  ;;; backup route for WAN1 devices to WAN2
        0.0.0.0/0                          8.8.4.4                   2
 2 A S  ;;; DEFAULT route for WAN2 devices to WAN2
        0.0.0.0/0                          8.8.4.4                   1
 3   S  ;;; backup route for WAN2 devices to WAN1
        0.0.0.0/0                          8.8.8.8                   2
 5 A S  ;;; Ping target 2 on WAN2
        8.8.4.4/32                         192.168.15.1              1
 6 A S  ;;; Ping target 1 on WAN1
        8.8.8.8/32                         192.168.10.1              1
 7 ADC  192.168.10.0/24    192.168.10.10   ether7                    0
 8 ADC  192.168.15.0/24    192.168.15.254  ether6                    0
 9  DC  192.168.40.0/24    192.168.40.1    sfp-sfpplus1            255
10 ADC  192.168.50.0/24    192.168.50.1    bridge1                   0
11  DC  192.168.51.0/24    192.168.51.1    ether5                  255
12  DC  192.168.80.0/24    192.168.80.1    ether8                  255

Route

/ip route
add check-gateway=ping comment="DEFAULT route for WAN1 devices to WAN1" distance=1 gateway=8.8.8.8 routing-mark=to_WAN1 scope=10
add check-gateway=ping comment="backup route for WAN1 devices to WAN2" distance=2 gateway=8.8.4.4 routing-mark=to_WAN1 scope=10
add check-gateway=ping comment="DEFAULT route for WAN2 devices to WAN2" distance=1 gateway=8.8.4.4 routing-mark=to_WAN2 scope=10
add check-gateway=ping comment="backup route for WAN2 devices to WAN1" distance=2 gateway=8.8.8.8 routing-mark=to_WAN2 scope=10
add comment="Ping target 2 on WAN2" distance=1 dst-address=8.8.4.4/32 gateway=192.168.15.1 scope=10
add comment="Ping target 1 on WAN1" distance=1 dst-address=8.8.8.8/32 gateway=192.168.10.1 scope=10

Mangle (EDIT - this is not correct, see next post)

add action=mark-connection chain=prerouting comment="Anything to Local 192.168.50.0/24 set NO MARK drop-out mangle" dst-address-list=MyLocalLAN \
    in-interface=bridge1 new-connection-mark=no-mark passthrough=no
add action=mark-connection chain=prerouting comment=WAN1 dst-address-list=!MyLocalWAN2 new-connection-mark=WAN1conn passthrough=yes src-address-list=\
    to_WAN1list
add action=mark-connection chain=prerouting connection-mark=!WAN2conn in-interface=ether7 new-connection-mark=WAN1conn passthrough=no
add action=mark-connection chain=prerouting dst-address-list=MyLocalWAN1 new-connection-mark=WAN1conn passthrough=yes
add action=mark-routing chain=prerouting connection-mark=WAN1conn new-routing-mark=to_WAN1 passthrough=no
add action=mark-routing chain=output connection-mark=WAN1conn new-routing-mark=to_WAN1 passthrough=no
add action=mark-connection chain=prerouting comment=WAN2 dst-address-list=!MyLocalWAN1 new-connection-mark=WAN2conn passthrough=yes src-address-list=\
    to_WAN2list
add action=mark-connection chain=prerouting connection-mark=!WAN1conn in-interface=ether6 new-connection-mark=WAN2conn passthrough=no
add action=mark-connection chain=prerouting dst-address-list=MyLocalWAN2 new-connection-mark=WAN2conn passthrough=yes
add action=mark-routing chain=prerouting connection-mark=WAN2conn new-routing-mark=to_WAN2 passthrough=no
add action=mark-routing chain=output connection-mark=WAN2conn new-routing-mark=to_WAN2 passthrough=no

MyLocalWan1 = 192.168.10.0/24
MyLocalWan2 = 192.168.15.0/24
MyLocalLan = 192.168.50.0/24

dave864 · Mon Aug 24, 2020 12:00 pm

I had another go at doing the mangle without conn marks and I think that worked.

add action=mark-routing chain=prerouting comment=WAN1 dst-address-list=!to_WAN2list new-routing-mark=to_WAN1 passthrough=no src-address-list=to_WAN1list
add action=mark-routing chain=prerouting comment=WAN2 dst-address-list=!to_WAN1list new-routing-mark=to_WAN2 passthrough=no src-address-list=to_WAN2list

With SRCNAT There is a delay of maybe 10 seconds but the sessions do not appear to recover when switching over. Pages already trying to load then stall.
With Masquerade NAT there is the same delay but things don't properly load for about 10 seconds, then after that all is ok. I do get some page stall but appears less pronounced.

So I guess my question is which NAT type is best?

/ip firewall nat
add action=src-nat chain=srcnat comment=WAN1 disabled=yes out-interface=ether7 src-address=192.168.50.0/24 to-addresses=192.168.10.10
add action=masquerade chain=srcnat comment=WAN1 out-interface=ether7
add action=src-nat chain=srcnat comment=WAN2 disabled=yes out-interface=ether6 src-address=192.168.50.0/24 to-addresses=192.168.15.254
add action=masquerade chain=srcnat comment=WAN2 out-interface=ether6

DarkNate · Mon Aug 24, 2020 12:40 pm

I had another go at doing the mangle without conn marks and I think that worked.
Code: Select all
add action=mark-routing chain=prerouting comment=WAN1 dst-address-list=!to_WAN2list new-routing-mark=to_WAN1 passthrough=no src-address-list=to_WAN1list
add action=mark-routing chain=prerouting comment=WAN2 dst-address-list=!to_WAN1list new-routing-mark=to_WAN2 passthrough=no src-address-list=to_WAN2list
With SRCNAT There is a delay of maybe 10 seconds but the sessions do not appear to recover when switching over. Pages already trying to load then stall.
With Masquerade NAT there is the same delay but things don't properly load for about 10 seconds, then after that all is ok. I do get some page stall but appears less pronounced.

So I guess my question is which NAT type is best?
Code: Select all
/ip firewall nat
add action=src-nat chain=srcnat comment=WAN1 disabled=yes out-interface=ether7 src-address=192.168.50.0/24 to-addresses=192.168.10.10
add action=masquerade chain=srcnat comment=WAN1 out-interface=ether7
add action=src-nat chain=srcnat comment=WAN2 disabled=yes out-interface=ether6 src-address=192.168.50.0/24 to-addresses=192.168.15.254
add action=masquerade chain=srcnat comment=WAN2 out-interface=ether6

Masqurade is meant for PPPoE or DHCP Client with dynamic IP.

Try removing the "src-address" completely. And test it again. See how it fares.

dave864 · Tue Aug 25, 2020 11:21 pm

I removed the Source address and it made no difference.
I don't know if I'm imagining it but now I have a simple Mangle on Prerouting, it appears that some web pages are stalling. Is it correct to simply have a single prerouting mangle rule covering the lan (for each WAN)?

add action=mark-routing chain=prerouting comment=WAN1 dst-address-list=!to_WAN2list new-routing-mark=to_WAN1 passthrough=no src-address-list=to_WAN1list
add action=mark-routing chain=prerouting comment=WAN2 dst-address-list=!to_WAN1list new-routing-mark=to_WAN2 passthrough=no src-address-list=to_WAN2list

Should I not have something covering WAN inputs too?

Is it possible to have a single NAT but attach 2x WAN to it?

rkrisi · Mon Aug 31, 2020 1:39 am

I would need to have a failover link in my setup.
Reading through this thread, I'm a little bit confused and I was unable to use this in my setup.
Can someone help me? Is this a good way to go?

What I would need:
I have 2 uplinks (ether) and I would need if the first (main) goes down to route all traffic to the second (failover) uplink. When the main link is up, don't route anything to the failover link.
I have tried setting the routes as described in the first post, but it did not work. Later from the thread I realized that I would need to setup mangle rules for this to work.

What would be the best in my scenario? Setup mangle rules to mark packets or something else? Can anyone help me setting up the mangle rules?

Thanks!

Chupaka · Mon Aug 31, 2020 2:19 pm

I have tried setting the routes as described in the first post, but it did not work. Later from the thread I realized that I would need to setup mangle rules for this to work.

You don't need routing marks at all:

/ip route
add dst-address=CheckingHost gateway=GW_MAIN_IP scope=10
add distance=1 gateway=CheckingHost check-gateway=ping
add distance=10 gateway=GW_FAILOVER_IP

rkrisi · Mon Aug 31, 2020 2:25 pm

I have tried setting the routes as described in the first post, but it did not work. Later from the thread I realized that I would need to setup mangle rules for this to work.
You don't need routing marks at all:
Code: Select all
/ip route
add dst-address=CheckingHost gateway=GW_MAIN_IP scope=10
add distance=1 gateway=CheckingHost check-gateway=ping
add distance=10 gateway=GW_FAILOVER_IP

Thanks I will try this!

I forgot to mention that I use DHCP client because I don't have static public IPs. Only dynamic ones. I assume that this works there also, I just need to remove the default route from dhcp-client right?

Chupaka · Mon Aug 31, 2020 2:38 pm

If your gateways are static (I didn't see any situations where they are not), just disable adding the default route. If they are not, you may use DHCP Client Script to update your routes with correct gateways.

rkrisi · Mon Aug 31, 2020 6:59 pm

If your gateways are static (I didn't see any situations where they are not), just disable adding the default route. If they are not, you may use DHCP Client Script to update your routes with correct gateways.

Gateways are static, I have 2 dedicated uplink gateway, but it's IP addresses are not static.

If I add this to my route list:

add dst-address=CheckingHost gateway=GW_MAIN_IP scope=10

Like this:

1 A S  8.8.8.8/32                         ether1                    1

I can't reach 8.8.8.8 from my network. Why?

The only thing that might be wrong that I did not add IP to gateway, but a port. However I can't add IP here, because I don't have a static IP.

Chupaka · Tue Sep 01, 2020 1:04 am

"gateway=etherN" works not the same as with point-to-point interfaces, and definitely not as you expect. Don't use this.

Gateways are static

So use gateway IPs in gateway= parameter, that's exactly what you need.

DarkNate · Tue Sep 01, 2020 1:47 am

I have tried setting the routes as described in the first post, but it did not work. Later from the thread I realized that I would need to setup mangle rules for this to work.
You don't need routing marks at all:
Code: Select all
/ip route
add dst-address=CheckingHost gateway=GW_MAIN_IP scope=10
add distance=1 gateway=CheckingHost check-gateway=ping
add distance=10 gateway=GW_FAILOVER_IP

So why did you use routing marks in the original post in the first place?

rkrisi · Tue Sep 01, 2020 12:29 pm

"gateway=etherN" works not the same as with point-to-point interfaces, and definitely not as you expect. Don't use this.

Gateways are static
So use gateway IPs in gateway= parameter, that's exactly what you need.

Yes, sorry I forgot that this would only work with point-to-point interfaces. I can add the IP address there, but what if the IP address changes on this gateway?

It seems to work like this:

 0 A S  0.0.0.0/0                          8.8.8.8                   1
 1   S  0.0.0.0/0                          GW2_IP              10
 2 A S  8.8.8.8/32                      GW1_IP               1

However this way, I need to update at least GW1_IP in case it changes... What is the best way to do this? DHCP client script?

Chupaka · Tue Sep 01, 2020 1:32 pm

So why did you use routing marks in the original post in the first place?

Because that config was for traffic balancing. Failover scenario can be greatly simplified, as you can see :)

However this way, I need to update at least GW1_IP in case it changes... What is the best way to do this? DHCP client script?

Exactly. Add a comment to your route (e.g. "GW1_IP") and then something like this in DHCP Client Script should be quite enough:

:if ($bound=1) do={
  :local iface $interface
  :local gw [ /ip dhcp-client get [ find interface=$"iface" ] gateway ]
  /ip route set [ find comment="GW1_IP" gateway!=$gw ] gateway=$gw
}

rkrisi · Tue Sep 01, 2020 1:50 pm

So why did you use routing marks in the original post in the first place?
Because that config was for traffic balancing. Failover scenario can be greatly simplified, as you can see :)

However this way, I need to update at least GW1_IP in case it changes... What is the best way to do this? DHCP client script?
Exactly. Add a comment to your route (e.g. "GW1_IP") and then something like this in DHCP Client Script should be quite enough:
Code: Select all
:if ($bound=1) do={
  :local iface $interface
  :local gw [ /ip dhcp-client get [ find interface=$"iface" ] gateway ]
  /ip route set [ find comment="GW1_IP" gateway!=$gw ] gateway=$gw
}

Thanks for your help and detailed answers! :)
From the first post I also assumed that this is clearly for failover and not load balancing. Anyway, right now this approach seems to work as it should!
Can I use multiple CheckingHosts here? If yes, how?
Also is it possible to send Email when the failover link becomes active?

I have another location where something similar should be done, however the layout of the network if a little bit different.
I have an LTE uplink and a WiFi client uplink. I want to fallback to WiFi when the LTE is unreachable. I saw that you (same as me) don't have much experience with LTE, but maybe you can help.
Wifi has DHCP client, so the same script could be used there.
However LTE does not have DHCP client, default route is added in LTE APN. It seems that it is a point-to-point interface because the address/network is

1 D 100.115.98.168/32  100.115.98.168  lte1

What I don't know if this address changes or not.. If not, then in this scenario gateway=interface can be used right?

DarkNate · Tue Sep 01, 2020 5:46 pm

So why did you use routing marks in the original post in the first place?
Because that config was for traffic balancing. Failover scenario can be greatly simplified, as you can see :)

Ah, that makes sense. This is what I've done. And it works when tested, as my ISP1 goes down pretty often and I never noticed any down-time on my client devices.

/ip route
add dst-address=8.8.8.8 gateway=pppoe-out1 scope=10
add dst-address=104.16.248.249 gateway=pppoe-out2 scope=10

/ip route
add distance=1 gateway=104.16.248.249 routing-mark=to_ISP1 check-gateway=ping
add distance=2 gateway=8.8.8.8 routing-mark=to_ISP1 check-gateway=ping

/ip route
add distance=1 gateway=8.8.8.8 routing-mark=to_ISP2 check-gateway=ping
add distance=2 gateway=104.16.248.249 routing-mark=to_ISP2 check-gateway=ping

But the "gateway" "host" is shown as unreachable. Any ideas why?

Chupaka · Wed Sep 02, 2020 1:47 pm

Can I use multiple CheckingHosts here? If yes, how?

Sure, just add a route to a new checking host and add default route via that host. One of those default routes will be active.

Also is it possible to send Email when the failover link becomes active?

You need some external script to check, for example, if your failover route is active and then do what you want.

However LTE does not have DHCP client, default route is added in LTE APN. It seems that it is a point-to-point interface because the address/network is
Code: Select all
1 D 100.115.98.168/32  100.115.98.168  lte1 
What I don't know if this address changes or not.. If not, then in this scenario gateway=interface can be used right?

You just create a route with gateway=LTE and see if you still have access to the Internet :)

Chupaka · Wed Sep 02, 2020 1:52 pm

But the "gateway" "host" is shown as unreachable. Any ideas why?

Yeah, all your routes are not working :( It's because of RouterOS limitation: recursive routes cannot be resolved via interface routes (i.e. gateway=pppoe-out1 is bad route).

As a workaround, you make a copy of your PPP Profile that is used for that PPPoE connection, set "Remote address" to your Host (e.g. remote-address=8.8.8.8) and then simply remove a route to 8.8.8.8. Everything should work after that.

rkrisi · Wed Sep 02, 2020 1:57 pm

Can I use multiple CheckingHosts here? If yes, how?
Sure, just add a route to a new checking host and add default route via that host. One of those default routes will be active.

Also is it possible to send Email when the failover link becomes active?
You need some external script to check, for example, if your failover route is active and then do what you want.
However LTE does not have DHCP client, default route is added in LTE APN. It seems that it is a point-to-point interface because the address/network is
Code: Select all
1 D 100.115.98.168/32  100.115.98.168  lte1 
What I don't know if this address changes or not.. If not, then in this scenario gateway=interface can be used right?
You just create a route with gateway=LTE and see if you still have access to the Internet :)

Well I'm pretty sure now that this would work this, because the default route added by ROS is also just using gateway=lte1, so this is a point-to-point interface.
However I have added everything as described, with the lte interface instead of IP and the connection won't come up with this setup (These devices are in a remote location so I don't really know what happens after these routes are applied).

DarkNate · Wed Sep 02, 2020 2:22 pm

But the "gateway" "host" is shown as unreachable. Any ideas why?
Yeah, all your routes are not working :( It's because of RouterOS limitation: recursive routes cannot be resolved via interface routes (i.e. gateway=pppoe-out1 is bad route).

As a workaround, you make a copy of your PPP Profile that is used for that PPPoE connection, set "Remote address" to your Host (e.g. remote-address=8.8.8.8) and then simply remove a route to 8.8.8.8. Everything should work after that.

Hey, it seems to have worked, thanks.

I suggest you actually add this work-around in the OP so that everyone else can get it right from first try.

Chupaka · Wed Sep 02, 2020 2:28 pm

However I have added everything as described, with the lte interface instead of IP and the connection won't come up with this setup (These devices are in a remote location so I don't really know what happens after these routes are applied).

Yeah, please see my post above... I don't know what's the best way to deal with LTE in that case. If there's no event-driven scripts (like a script in DHCP Client), then probably a Scheduler is your friend, to check like every minute or so if LTE gateway IP changed and change your route accordingly.

DarkNate · Wed Sep 02, 2020 9:12 pm

But the "gateway" "host" is shown as unreachable. Any ideas why?
Yeah, all your routes are not working :( It's because of RouterOS limitation: recursive routes cannot be resolved via interface routes (i.e. gateway=pppoe-out1 is bad route).

As a workaround, you make a copy of your PPP Profile that is used for that PPPoE connection, set "Remote address" to your Host (e.g. remote-address=8.8.8.8) and then simply remove a route to 8.8.8.8. Everything should work after that.

How would we replicate this same work-around when a PPPoE connection has both native IPv4 and native IPv6? I was able to do it for IPv4, but not for IPv6 as the "remote-address" only accepts a single IP.

Chupaka · Thu Sep 03, 2020 12:25 am

Unfortunately, I don't have PPPoE with IPv6, so can't even test... You may try to add your checking IP directly to the interface and see if it helps.

DarkNate · Thu Sep 03, 2020 12:53 am

Unfortunately, I don't have PPPoE with IPv6, so can't even test... You may try to add your checking IP directly to the interface and see if it helps.

That can't be done as mentioned before, "Remote Address" in PPP profile only accepts a single address. Which is the same for IPv4, it limits recursive routing to single "Checking Host" as we can't use for more than one IP for each PPP profile/interface's "Remote Address".

Any other possible workarounds?

Chupaka · Thu Sep 03, 2020 1:12 am

I mean, not via PPP Profile but directly, with /ipv6 address add

DarkNate · Thu Sep 03, 2020 5:08 pm

I mean, not via PPP Profile but directly, with /ipv6 address add

Yeah, I tried it via address. Does not work.

Regarding IPv4 PPP profile, how could we have more than one "checking host" when "remote address" per PPP profile is limited to one?

Chupaka · Thu Sep 03, 2020 5:54 pm

I see that you may add those routes manually via:

/ip address
add interface=PPP address=127.1.2.3 network=8.8.4.4

This (8.8.4.4) does work as gateway for recursive routes, according to my quick testing.

DarkNate · Fri Sep 04, 2020 12:16 am

I see that you may add those routes manually via:
Code: Select all
/ip address
add interface=PPP address=127.1.2.3 network=8.8.4.4
This (8.8.4.4) does work as gateway for recursive routes, according to my quick testing.

But I get dynamic IPs from the PPPoE which my ISP changes randomly throughout the day. So that wouldn't work.

Chupaka · Fri Sep 04, 2020 2:09 am

Well, by 127.1.2.3 I meant exactly 127.1.2.3, i.e. any private/unused address.

DarkNate · Fri Sep 04, 2020 7:12 pm

Well, by 127.1.2.3 I meant exactly 127.1.2.3, i.e. any private/unused address.

Wait I'm not following so here, please help me understand this.

/ip address
add interface=PPP address=127.1.2.3 network=8.8.4.4

Network of course refers to the "test host".
Address given means any private/unused address? I don't understand its function/purpose, I don't see how it would help in recursive routing failover with pppoe interfaces.

I tried this but it results in "reachable" only via single pppoe2 or 1 even if it's destined for the other one

###Workaround for interfaces###
/ip address
add address=127.0.0.1 comment="Host for Recursive Routing on ISP 1" interface=pppoe-out1 network=8.8.8.8
add address=127.0.0.1 comment="Host for Recursive Routing on ISP 1" interface=pppoe-out1 network=1.1.1.1

add address=127.0.0.1 comment="Host for Recursive Routing on ISP 2" interface=pppoe-out2 network=1.1.1.1
add address=127.0.0.1 comment="Host for Recursive Routing on ISP 2" interface=pppoe-out2 network=8.8.8.8

/ip route
add distance=1 gateway=8.8.8.8 routing-mark=to_ISP1 check-gateway=ping comment="Recursive Route for first test host to ISP 1"
add distance=2 gateway=1.1.1.1 routing-mark=to_ISP1 check-gateway=ping comment="Recursive Route for second test host to ISP 1"

/ip route
add distance=1 gateway=1.1.1.1 routing-mark=to_ISP2 check-gateway=ping comment="Recursive Route for first test host to ISP 2"
add distance=2 gateway=8.8.8.8 routing-mark=to_ISP2 check-gateway=ping comment="Recursive Route for second test host to ISP 2"

Chupaka · Sat Sep 05, 2020 1:38 pm

You cannot check different uplinks via the same test host. So you need different hosts per uplink (like 8.8.8.8 and 1.1.1.1 for ISP1 and 8.8.4.4 and 1.0.0.1 for ISP2)

DarkNate · Sat Sep 05, 2020 2:10 pm

You cannot check different uplinks via the same test host. So you need different hosts per uplink (like 8.8.8.8 and 1.1.1.1 for ISP1 and 8.8.4.4 and 1.0.0.1 for ISP2)

So I tried this:

###Workaround for interfaces###
/ip address
add address=127.0.0.1 comment="Host for Recursive Routing on ISP 1" interface=pppoe-out1 network=8.8.8.8
add address=127.0.0.1 comment="Host for Recursive Routing on ISP 1" interface=pppoe-out1 network=1.0.0.1
add address=127.0.0.1 comment="Host for Recursive Routing on ISP 2" interface=pppoe-out2 network=1.1.1.1
add address=127.0.0.1 comment="Host for Recursive Routing on ISP 2" interface=pppoe-out2 network=8.8.4.4

/ip route
add distance=1 gateway=8.8.8.8 routing-mark=to_ISP1 check-gateway=ping comment="Recursive Route for first test host to ISP 1"
add distance=2 gateway=1.0.0.1 routing-mark=to_ISP1 check-gateway=ping comment="Recursive Route for second test host to ISP 1"
add distance=1 gateway=1.1.1.1 routing-mark=to_ISP2 check-gateway=ping comment="Recursive Route for first test host to ISP 2"
add distance=2 gateway=8.8.4.4 routing-mark=to_ISP2 check-gateway=ping comment="Recursive Route for second test host to ISP 2"

So they are reachable

But the routes IP>Routes still shows "unreachable", if I toggle them on/off, they show reachable, 5 seconds later unreachable. Looks like a ROS bug. Any workarounds? Blackhole route workaround does not work for me.

At the moment, I'm using single test host via PPP profile of each PPPoE interface, now this works flawlessly, no problems except only single test host.

Chupaka · Mon Sep 07, 2020 4:27 pm

Yeah, something weird... Could you write to support@mikrotik.com with your problem?

DarkNate · Mon Sep 07, 2020 6:23 pm

Yeah, something weird... Could you write to support@mikrotik.com with your problem?

Ahahaha... MikroTik and "support" does not go very well. At the moment I have an open ticket with them about a critical bug that effective kills my internet access, it's similar to this: viewtopic.php?f=13&t=157989

MikroTik being MikroTik, haven't really been quick to respond, almost 30 days for single respond in-between. I will create fresh ticket with them about this recursive routing problem later if possible.

Thanks for your contribution. Even if only single host is working for multiple ISPs, it's still more efficient than manual scripting/disabling/enabling etc, using Cloudflare and Google is good enough I guess. Do update your main thread about all these new techniques that you've discovered and tested.
And let us know if you have some new methods for IPv6, I have tried with the same IPv6>Address method, it does not work.

nevolex · Mon Sep 14, 2020 10:09 am

Hi Chupaka and other guys,

can you please help to figure ou how I can make lte faiolver to work in my case

I am trying to do a failover but stuck as the lte modem on a different router

Main router 1 wan, 2 dhcp servers vlan 10 and vlan 20

[admin@MikroTik_RB4011] /ip route> pri
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
# DST-ADDRESS PREF-SRC GATEWAY DISTANCE
0 ADS 0.0.0.0/0 121.99.228.1 1
1 ADC 10.10.0.0/24 10.10.0.1 bridge_vlan10_main 0
2 ADC 10.20.0.0/24 10.20.0.1 bridge_vlan20_g... 0
3 ADC 121.99.xxx.xxx/19 121.99.xxx.xxx Orcon_ISP 0
[admin@MikroTik_RB4011] /ip route>

the second router has a lte wan connection

[admin@MikroTik_hap_ac2] /ip route> print
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
# DST-ADDRESS PREF-SRC GATEWAY DISTANCE
0 ADS 0.0.0.0/0 10.10.0.1 1
1 DS 0.0.0.0/0 10.20.0.1 1
2 DS 0.0.0.0/0 lte1 2
3 ADC 10.10.0.0/24 10.10.0.3 bridge_vlan10_main 0
4 ADC 10.20.0.0/24 10.20.0.3 bridge_vlan20_g... 0
5 ADC 100.80.xxx.xxx/32 100.80.xxx.xxx lte1 0

what would be the right routing here as at this moment it is all dynamic

thanks a lot

Chupaka · Mon Sep 14, 2020 1:54 pm

DId you try to use LTE Passthrough (via VLAN, for example) to setup IP address directly on main router?

nevolex · Mon Sep 14, 2020 2:57 pm

DId you try to use LTE Passthrough (via VLAN, for example) to setup IP address directly on main router?

thank you, does it mean i need to build a dchp server on the second router and the main router to be the client?

Chupaka · Thu Sep 17, 2020 1:30 am

does it mean i need to build a dchp server on the second router and the main router to be the client?

LTE Passthrough on the second router, main router is dhcp client.

rkrisi · Fri Sep 18, 2020 3:17 pm

So why did you use routing marks in the original post in the first place?
Because that config was for traffic balancing. Failover scenario can be greatly simplified, as you can see :)

However this way, I need to update at least GW1_IP in case it changes... What is the best way to do this? DHCP client script?
Exactly. Add a comment to your route (e.g. "GW1_IP") and then something like this in DHCP Client Script should be quite enough:
Code: Select all
:if ($bound=1) do={
  :local iface $interface
  :local gw [ /ip dhcp-client get [ find interface=$"iface" ] gateway ]
  /ip route set [ find comment="GW1_IP" gateway!=$gw ] gateway=$gw
}

Is this a right approach to use multiple checkingHosts? So if I'm right, this will check 8.8.8.8 and 1.1.1.1 and if it is not reachable, the WAN1 will be unreachable and WAN2 will be the active route right?

 0 A S  0.0.0.0/0                          8.8.8.8                   1
 1   S  0.0.0.0/0                          1.1.1.1                   1
 2   S  ;;; WAN2
        0.0.0.0/0                          192.168.2.1              10
 3 A S  1.1.1.1/32                         188.142.192.254           1
 4 A S  ;;; WAN1
        8.8.8.8/32                         188.142.192.254           1

Chupaka · Fri Sep 18, 2020 4:29 pm

Correct

rkrisi · Fri Sep 18, 2020 5:56 pm

Correct

Thanks for your help!

A9691 · Thu Oct 29, 2020 2:02 pm

I'm trying for some time to figure out how to access both checking hosts when one isp is down?


/ip route

add check-gateway=ping comment="default route - recursive via working gateway" distance=2  gateway=8.8.4.4
add check-gateway=ping comment="default route - recursive via working gateway" distance=3  gateway=8.8.8.8

add comment="next hop search on ether 1" distance=1 dst-address=8.8.4.4/32 gateway=192.168.33.1 scope=10
add comment="next hop search on ether 2" distance=1 dst-address=8.8.8.8/32 gateway=192.168.4.1 scope=10

The failover part works fine, but if one path fails, the corresponding checking host will be unavailable.
If the 192.168.33.1 router stops forwarding then 8.8.4.4 becomes unavailable from inside.

Is there a way to tell the router to use the 3. routing rule only in resolving the recursive route?

Chupaka · Thu Oct 29, 2020 5:23 pm

Good point. In balancing mode you don't face this because all traffic is marked, and there's no marked routes to the checking hosts...

Well, you need to add some kind of traffic marking then :) It will be a single routing table, but it's needed in that case...

A9691 · Fri Oct 30, 2020 9:59 am

Thank you for your answer.
I was hoping there was a neat and simple scope/target scope "magic" to solve the problem.

This one worked, and I think is still simple enough, one can see how recursive routes work:

/ip route
add check-gateway=ping comment="default route - recursive via working gateway" distance=2 gateway=8.8.4.4 routing-mark=exit
add check-gateway=ping comment="default route - recursive via working gateway" distance=3 gateway=8.8.8.8 routing-mark=exit
add distance=1 comment="ISP1 local" dst-address=192.168.4.0/24 gateway=192.168.4.1 routing-mark=exit
add distance=1 comment="ISP2 local" dst-address=192.168.33.0/24 gateway=192.168.33.1 routing-mark=exit

add comment="next hop search" distance=1 dst-address=8.8.4.4/32 gateway=192.168.33.1 scope=10
add comment="next hop search" distance=1 dst-address=8.8.8.8/32 gateway=192.168.4.1 scope=10

/ip route rule
add interface=lan table=exit

rvnet · Tue Nov 03, 2020 2:11 am

So, I've been following this thread, and I'm still a little lost. I've been working in telecom for over 25 years, but I'm having a little difficulty with this one.

I have an hAP AC ( RB962UiGS-5HacT2HnT) running 6.47.7.

I have two Internet sources. Both are dynamic IP, modems, one cable modem with ethernet out on ether1, and one LTE modem with ethernet out on ether2. Both receive DHCP from their providers, and pass through to the router.

I want to setup ether1 as primary, and ether2 as secondary with failover/failback.

Right now, the DHCP clients for each are setup to create default routes. Doing this, if I unplug ether1 I get failover to ether2, and back when ether1 is plugged back in. I want to go to the next step and failover/back when the connected modem is still up, but the provider is out. What do you recommend? I'm planning to start over at this point.

WeWiNet · Tue Nov 03, 2020 7:01 pm

Hi Chupaka,

can you give the "route" piece of your example for an ROS7 implementation?

As you probably know the "routing-mark" attribute is gone in ROS7.
Unfortunately the doc / wiki on R7 is really thin and I have no idea how to make rule/table in R7 to do "routing mark"
based routing...

Mangle and all that is clear and working fine... but the route is what kills me...

Chupaka · Tue Nov 03, 2020 10:49 pm

Right now, the DHCP clients for each are setup to create default routes. Doing this, if I unplug ether1 I get failover to ether2, and back when ether1 is plugged back in. I want to go to the next step and failover/back when the connected modem is still up, but the provider is out. What do you recommend? I'm planning to start over at this point.

As you use direct default route from DHCP, and for this type of checking you need to use recursive routes, you need to disable "add-default-route" in your DHCP Client for ether1 and either add static route (if you know that gateway will stay the same) or use DHCP Client script to create/update your routes dynamically.

can you give the "route" piece of your example for an ROS7 implementation?

As you probably know the "routing-mark" attribute is gone in ROS7.
Unfortunately the doc / wiki on R7 is really thin and I have no idea how to make rule/table in R7 to do "routing mark"
based routing...

Well, I'm waiting for more or less usable ROS7 to start playing with this, but for now I see from the docs that "as per user requests v7.0beta9 adds back 'routing-table' parameter" :)
https://help.mikrotik.com/docs/display/ ... icyRouting

Chaosphere64 · Tue Nov 10, 2020 10:20 am

Thanks for providing this solution!

But I think I did not quite understand it completely. I am currently trying to implement the Basic Setup.

I have two ISPs (156.x.x.x is GW1, 82.x.x.x is GW2), and I just want to use ISP2/GW2 as a failover, not for load balancing.

My question is: do I need a second default route with distance=2 for each HostN?

My (reduced) routing config looks like this:

/ip route
add check-gateway=ping distance=1 gateway=8.8.8.8
add check-gateway=ping distance=2 gateway=8.8.4.4
add distance=1 dst-address=8.8.4.4/32 gateway=82.x.x.x scope=10
add distance=1 dst-address=8.8.8.8/32 gateway=156.x.x.x scope=10

My (reduced) routing table looks like this:

/ip route> print 
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 2 A S  0.0.0.0/0                          8.8.8.8                   1
 3   S  0.0.0.0/0                          8.8.4.4                   2
 4 A S  8.8.4.4/32                         82.x.x.x              1
 5 A S  8.8.8.8/32                         156.x.x.x             1

In my understanding as soon as 8.8.8.8 stops to respond to ping the primary default route is no longer active and the secondary default route with distance=2 kicks in.

I can't help but think that I did not get it completely, so it would be great get a little enlightenment.

Thanks!

Chupaka · Tue Nov 10, 2020 2:07 pm

I have two ISPs (156.x.x.x is GW1, 82.x.x.x is GW2), and I just want to use ISP2/GW2 as a failover, not for load balancing.

My question is: do I need a second default route with distance=2 for each HostN?
Code: Select all
/ip route
add check-gateway=ping distance=1 gateway=8.8.8.8
add check-gateway=ping distance=2 gateway=8.8.4.4
add distance=1 dst-address=8.8.4.4/32 gateway=82.x.x.x scope=10
add distance=1 dst-address=8.8.8.8/32 gateway=156.x.x.x scope=10

Well, technically, for failover you need to check only GW1: if GW2 is unavailable, you have nothing to do with that in case of GW1 failure :)

So, something like this can do the trick:

/ip route
add check-gateway=ping distance=1 gateway=8.8.8.8
add check-gateway=ping distance=2 gateway=82.x.x.x
add dst-address=8.8.8.8/32 gateway=156.x.x.x scope=10

Also, please check the discussion above: in case of GW1 failure you won't be able to access 8.8.8.8. The workaround is finalized here: viewtopic.php?p=825704#p825704

Chaosphere64 · Tue Nov 10, 2020 4:37 pm

Ok, thanks for taking the time!

anav · Mon Dec 14, 2020 8:08 pm

Trying to follow all these inter threads, in this topic is quite a challenge!!
I for one would like to express my gratitude to Chupaka, (or as I normally write it Chewbacca) for the initial content and idea of the thread and the patience throughout to answer all queries.
Well done sir, and I hope there is more than a lump of coal in your Christmas stocking this year. ;-)
I will certainly be raising a glass of some fine ale in your honour over the holiday season!!

Chupaka · Mon Dec 14, 2020 10:57 pm

Thanks =)

I'm a bit out of networking/ISP for a couple of years already, but still doing my best to support the community :D

SiB · Tue Dec 15, 2020 12:44 pm

This FailOver was public at MikroTik WIKI, now it's moved/re-write here: https://help.mikrotik.com/docs/display/ ... upOverview

Chupaka · Thu Dec 17, 2020 3:12 pm

Great, now I know they reworked my article without even mentioning me... That's a bit depressing :)

CZFan · Thu Dec 17, 2020 3:33 pm

Great, now I know they reworked my article without even mentioning me... That's a bit depressing :)

Plagiarism much...

DarkNate · Thu Dec 17, 2020 5:01 pm

This FailOver was public at MikroTik WIKI, now it's moved/re-write here: https://help.mikrotik.com/docs/display/ ... upOverview

@SiB

Great, now I know they reworked my article without even mentioning me... That's a bit depressing :)

eldoncito2019 · Wed Dec 23, 2020 7:08 pm

Great, now I know they reworked my article without even mentioning me... That's a bit depressing :)

Greetings, friends I have a load balancing of 2 ISPs configured in (Policy Based Routing) with FAILOVER, and it works very well,
If I manually disconnect the interface of each ISP, the FAILOVER sends the ip addresses of the fallen isp to the other isp and the entire LAN network works fine,
and when I manually reconnect the interface the failover returns the ip addresses to each ISP and everything is normal,
But days ago the ISP 2 had problems delivering a dynamic IP address and the problem was in the central, the modem was working but it had no internet and
The failover did not work, I have ISP 2 connected from the modem to a router and from the router to my RB and thus I have a fixed IP (static),
Is there any way to ping 8.8.8.8 on ISP 2 and when that ping fails, the failover can be activated and what happened days ago will not happen again?
The ISP's failover is already pinging the router's gateway, which is 192.168.0.1, and since the router was on but had no internet,
That's where the failover didn't work because the ping was working fine. I don't know if you understand me.
Thanks for your help.

/ip address
add address=192.168.9.1/24 interface="ETHER 4" network=192.168.9.0
add address=192.168.6.3/24 interface="ETHER 1" network=192.168.6.0
add address=192.168.1.2/24 interface="ETHER 2" network=192.168.1.0

/interface list
add name=WAN
/interface list member
add interface="ETHER 1" list=WAN
add interface="ETHER 2" list=WAN

/ip firewall nat
add action=masquerade chain=srcnat out-interface-list=WAN

/ip firewall address-list
add address=192.168.9.46 list=ISP 2
add address=192.168.9.56 list=ISP 2
add address=192.168.9.60 list=ISP 2
add address=192.168.9.78 list=ISP 1
add address=192.168.9.85 list=ISP 2
add address=192.168.9.21 list=ISP 1
add address=192.168.9.22 list=ISP 1

/ip firewall mangle
add action=mark-routing chain=prerouting comment="TO ISP 1" \
new-routing-mark=ISP 1 passthrough=no src-address-list=ISP 1
add action=mark-routing chain=prerouting comment="TO ISP 2" \
new-routing-mark=ISP 2 passthrough=no src-address-list=ISP 2

/ip route
add check-gateway=ping distance=1 gateway=192.168.6.1 routing-mark=ISP 1
add check-gateway=ping distance=2 gateway=192.168.1.1 routing-mark=ISP 1
add check-gateway=ping distance=1 gateway=192.168.1.1 routing-mark=ISP 2
add check-gateway=ping distance=2 gateway=192.168.6.1 routing-mark=ISP 2

This is my load balancing configuration, is there a way to ping DNS 8.8.8.8 from ETHER 2 so that when ISP2 doesn't work the failover works?

According to this thread this could be the solution to my problem:
/ip route
add dst-address=8.8.8.8 scope=10 gateway=192.168.6.1
add dst-address=8.8.4.4 scope=10 gateway=192.168.0.1

add distance=1 gateway=8.8.8.8 routing-mark=ISP 1 check-gateway=ping
add distance=2 gateway=8.8.4.4 routing-mark=ISP 1 check-gateway=ping

add distance=1 gateway=8.8.4.4 routing-mark=ISP 2 check-gateway=ping
add distance=2 gateway=8.8.8.8 routing-mark=ISP 2 check-gateway=ping

msatter · Wed Dec 23, 2020 8:35 pm

Try using "ISP 1" and "ISP 2"

Nice to see you back....so soon.

eldoncito2019 · Wed Dec 23, 2020 9:07 pm

Try using "ISP 1" and "ISP 2"

Nice to see you back....so soon.

Greetings matter, I tried to do it with the scripts that you told me by disabling the interface, but it did not work for me, I think that with the use of recursive ways I can achieve that the failover is done and everything works well, later I will tell you how it went.

eldoncito2019 · Wed Dec 23, 2020 9:12 pm

Thank you for your answer.
I was hoping there was a neat and simple scope/target scope "magic" to solve the problem.

This one worked, and I think is still simple enough, one can see how recursive routes work:

/ip route
add check-gateway=ping comment="default route - recursive via working gateway" distance=2 gateway=8.8.4.4 routing-mark=exit
add check-gateway=ping comment="default route - recursive via working gateway" distance=3 gateway=8.8.8.8 routing-mark=exit
add distance=1 comment="ISP1 local" dst-address=192.168.4.0/24 gateway=192.168.4.1 routing-mark=exit
add distance=1 comment="ISP2 local" dst-address=192.168.33.0/24 gateway=192.168.33.1 routing-mark=exit

add comment="next hop search" distance=1 dst-address=8.8.4.4/32 gateway=192.168.33.1 scope=10
add comment="next hop search" distance=1 dst-address=8.8.8.8/32 gateway=192.168.4.1 scope=10

/ip route rule
add interface=lan table=exit

Taking advantage of the fact that we are at Christmas, how do I copy this and keep it this way, copy the configuration that I have and it didn't come out like that.
Tank you.

msatter · Wed Dec 23, 2020 10:15 pm

Try using "ISP 1" and "ISP 2"

Nice to see you back....so soon.
Greetings matter, I tried to do it with the scripts that you told me by disabling the interface, but it did not work for me, I think that with the use of recursive ways I can achieve that the failover is done and everything works well, later I will tell you how it went.

NOT disabling the interface this because, how do detect then that it is working again.

SiB · Thu Dec 24, 2020 3:50 am

eldoncito2019

But days ago the ISP 2 had problems delivering a dynamic IP address and the problem was in the central, the modem was working but it had no internet and
The failover did not work, I have ISP 2 connected from the modem to a router and from the router to my RB and thus I have a fixed IP (static),
Is there any way to ping 8.8.8.8 on ISP 2 and when that ping fails, the failover can be activated and what happened days ago will not happen again?

Yes, this answer you can read and learn in #1 at this thread. Recursive Route can "LOCK" path to some TARGET via proper ISP. It's exactly what you want.
.
.

The ISP's failover is already pinging the router's gateway, which is 192.168.0.1, and since the router was on but had no internet,
That's where the failover didn't work because the ping was working fine. I don't know if you understand me.

We know, you not read a #1 at this thread.
.
.
Please use this button and next paste your code between tags next time.

.
.

Code: Select all
/ip route
add check-gateway=ping distance=1 gateway=192.168.6.1 routing-mark=ISP 1
add check-gateway=ping distance=2 gateway=192.168.1.1 routing-mark=ISP 1
add check-gateway=ping distance=1 gateway=192.168.1.1 routing-mark=ISP 2
add check-gateway=ping distance=2 gateway=192.168.6.1 routing-mark=ISP 2
This is my load balancing configuration, is there a way to ping DNS 8.8.8.8 from ETHER 2 so that when ISP2 doesn't work the failover works?

Yes, this can be done. When you use SPACE char then always you must use " char e.g. variable="some text"
the Check-Gateway can be used ONES, all next are ignored... means you can on one rule use that action for the same gateway.
.
.

According to this thread this could be the solution to my problem:
Code: Select all
/ip route
add dst-address=8.8.8.8 scope=10 gateway=192.168.6.1
add dst-address=8.8.4.4 scope=10 gateway=192.168.0.1

YES, but...You write ones WAN2 as 192.168.1.1 default gateway, not ..0.1 !
.
.

add distance=1 gateway=8.8.8.8 routing-mark=ISP 1 check-gateway=ping
add distance=2 gateway=8.8.4.4 routing-mark=ISP 1 check-gateway=ping
add distance=1 gateway=8.8.4.4 routing-mark=ISP 2 check-gateway=ping
add distance=2 gateway=8.8.8.8 routing-mark=ISP 2 check-gateway=ping

If you replace all =ISP 1 to ="ISP 1" then this should start working.

eldoncito2019 · Thu Dec 24, 2020 3:33 pm

/ip route
add check-gateway=ping distance=1 gateway=8.8.8.8 routing-mark=ISP 1
add check-gateway=ping distance=2 gateway=8.8.4.4 routing-mark=ISP 1
add check-gateway=ping distance=1 gateway=8.8.4.4 routing-mark=ISP 2
add check-gateway=ping distance=2 gateway=8.8.8.8 routing-mark=ISP 2
add distance=1 dst-address=8.8.4.4/32 gateway=192.168.0.1 scope=10
add distance=1 dst-address=8.8.8.8/32 gateway=192.168.6.1 scope=10

It works perfectly like this.

Thanks to mstter and chupaka.

ELDONCITO2019.

Guscht · Sun Dec 27, 2020 1:45 pm

Implemented for our 3 WAN-Connections.
Works great! Thanks Chupaka!

But I have to admit, its very weak form Mikrotik to implement such a basic function not more directly.
A "Gateway Check" and a "WAN-Connectivity Check", where you can specify N IPs behind the Gateway.

No, they implement a "Detect Internet" feature, which works not really (mines never comes on 2 WAN-conns over "LAN") and is completely desinged around the routing tables. Sometimes Mikrotik is so overly complicated and I dont understand their way of designing features. Why not a clear approach and not a simple feature which doesnt require you to script around...

tdussa · Fri Jan 08, 2021 5:41 pm

Hi everyone,

I've been looking at how to implement a simple failover without any load balancing, so I came across this thread.

I understand the failover part, but I cannot get my head wrapped around this workaround:

/ip route
add check-gateway=ping comment="default route - recursive via working gateway" distance=2 gateway=8.8.4.4 routing-mark=exit
add check-gateway=ping comment="default route - recursive via working gateway" distance=3 gateway=8.8.8.8 routing-mark=exit
add distance=1 comment="ISP1 local" dst-address=192.168.4.0/24 gateway=192.168.4.1 routing-mark=exit
add distance=1 comment="ISP2 local" dst-address=192.168.33.0/24 gateway=192.168.33.1 routing-mark=exit

add comment="next hop search" distance=1 dst-address=8.8.4.4/32 gateway=192.168.33.1 scope=10
add comment="next hop search" distance=1 dst-address=8.8.8.8/32 gateway=192.168.4.1 scope=10

/ip route rule
add interface=lan table=exit

As I understand it, the idea is to make 8.8.4.4 available to hosts on the LAN even if the 192.168.33.1 gateway fails, right?
So the IP route rule makes all traffic that comes in from a LAN interface go through the "exit" routing table, which contains the first four routing entries, correct?
And traffic that originates on the router itself will just see the last two routing entries then?

If the above is correct, then I don't really understand how the additional two rules plus the routing marks make a difference. Could someone walk me through this, please?

Thanks a lot & Cheers,
Toby.

tdussa · Fri Jan 08, 2021 7:19 pm

Hi again,

aight, of course, half an hour after this post, I think I got it. :)

Here's my working theory:

/ip route
add check-gateway=ping comment="default route - recursive via working gateway" distance=2 gateway=8.8.4.4 routing-mark=exit
add check-gateway=ping comment="default route - recursive via working gateway" distance=3 gateway=8.8.8.8 routing-mark=exit

These two routes in the table `exit` mostly take care of the failover. Nothing special to see.

add distance=1 comment="ISP1 local" dst-address=192.168.4.0/24 gateway=192.168.4.1 routing-mark=exit
add distance=1 comment="ISP2 local" dst-address=192.168.33.0/24 gateway=192.168.33.1 routing-mark=exit

These two routes are unnecessary, I think---at least, in my case, everything worked totally as expected without them.
They were what threw me off in the first place---I couldn't figure out what they would contribute. As it turns out, nothing? :)

add comment="next hop search" distance=1 dst-address=8.8.4.4/32 gateway=192.168.33.1 scope=10
add comment="next hop search" distance=1 dst-address=8.8.8.8/32 gateway=192.168.4.1 scope=10

Again, these routes manage the failover. They're not in the `exit` routing table but in the main table
because otherwise the ping checks the router sends out would be unroutable.

/ip route rule
add interface=lan table=exit

And finally, this rule makes all traffic inbound from the interface `lan` to use the `exit` routing table for lookups first.

So, to put it all together, here's my understanding. The thing I didn't realize and that I believe is crucial to understand this scheme is that the `exit` routing table is used first, and the main table is only consulted because the next hop that results after going through the `exit` table is not directly reachable. Since there is no direct route for either 8.8.4.4 or 8.8.8.8 in the `exit` table, this means that if one of the routes is inactive, then the _other_ route gets pulled (instead of the direct route as found in the main table). Thus, if, say, 8.8.4.4 is not reachable, then the route via 8.8.4.4 is marked inactive, and if a client pings 8.8.4.4, then the only route left in the `exit` table is the default route to the _other_ gateway, 8.8.8.8. Thus, the nexthop is 8.8.8.8, but that is not reachable, so the router drop out of the `exit` table into the `main` table and finds a direct route for 8.8.8.8. *Phew* Makes sense now. As expected, traffic originating from the router itself is not marked because the `/ip route rule` entry does not match, and is still routed directly to the interfaces in case of 8.8.4.4 and 8.8.8.8, irrespective of whether those IPs are reachable.

Two more things turned out to be irritating:
1. The command `/ip route rule add interface=lan table=exit` does not work (at least not in ROS 6.48 with default config on my devices), because there is no interface called `lan`. I originally thought that the interface *list* was meant, but you have to give an *interface*, not a list. So `interface=bridge`, for instance`, works well instead.
2. Given the above routes, the router itself has *no* connectivity at all except for the direct routes to its gateways and to 8.8.4.4 and 8.8.8.8. If the router itself should be able to, say, contact an NTP server, then additional routes are necessary.

In particular, I had assumed that the mitigation does not kill the router connectivity, but it actually does, doesn't it?

Is my working theory accurate? If not, where did I take a wrong turn?

THX & Cheers,
Toby.

Chupaka · Sat Jan 09, 2021 4:07 pm

Yep, generally that's how it works. If you need the router connectivity - you should either create a rule for its traffic to go to 'exit' table, or add default route(s) to 'main' table.

tdussa · Sat Jan 09, 2021 5:08 pm

Hi,

Yep, generally that's how it works. If you need the router connectivity - you should either create a rule for its traffic to go to 'exit' table, or add default route(s) to 'main' table.

Cool, THX. So then for the record this is my minimal working example that does failover with high-availability for the external "canary" host for traffic that comes from the `bridge` interface (the canary host being 1.1.1.1, the primary gateway being 192.168.1.254, the secondary gateway being 192.168.2.254):

/ip route
add check-gateway=ping comment="Primary virtual route (HA)" distance=1 gateway=1.1.1.1 routing-mark=HA
add comment="Secondary route (HA)" distance=2 gateway=192.168.2.254 routing-mark=HA

add check-gateway=ping comment="Primary virtual route (main)" distance=1 gateway=1.1.1.1
add comment="Primary route (main)" distance=1 dst-address=1.1.1.1/32 gateway=192.168.1.254 scope=10
add comment="Secondary route (main)" distance=2 gateway=192.168.2.254

/ip route rule
add interface=bridge table=HA

No load balancing, and the secondary route is not checked because it is a fallback route only.

Cheers,
Toby.

DarkNate · Sat Jan 09, 2021 5:20 pm

Hi,

Yep, generally that's how it works. If you need the router connectivity - you should either create a rule for its traffic to go to 'exit' table, or add default route(s) to 'main' table.
Cool, THX. So then for the record this is my minimal working example that does failover with high-availability for the external "canary" host for traffic that comes from the `bridge` interface (the canary host being 1.1.1.1, the primary gateway being 192.168.1.254, the secondary gateway being 192.168.2.254):
Code: Select all
/ip route
add check-gateway=ping comment="Primary virtual route (HA)" distance=1 gateway=1.1.1.1 routing-mark=HA
add comment="Secondary route (HA)" distance=2 gateway=192.168.2.254 routing-mark=HA

add check-gateway=ping comment="Primary virtual route (main)" distance=1 gateway=1.1.1.1
add comment="Primary route (main)" distance=1 dst-address=1.1.1.1/32 gateway=192.168.1.254 scope=10
add comment="Secondary route (main)" distance=2 gateway=192.168.2.254

/ip route rule
add interface=bridge table=HA
No load balancing, and the secondary route is not checked because it is a fallback route only.

Cheers,
Toby.

You want just simple recursive failover without load balancing? You only need two rules in the main routing table with "default route" disabled in the WAN client (PPPoE etc), like this:

add check-gateway=ping comment="Default Route for ISP1 (Recursive 1)" distance=1 gateway=8.8.8.8
add check-gateway=ping comment="Default Route for ISP2 (Recursive 1)" distance=2 gateway=1.1.1.1

#Then create a separate route for each of the "gateway" themselves, in my case using the PPP profile hack

tdussa · Sat Jan 09, 2021 7:13 pm

Hi,

You want just simple recursive failover without load balancing? You only need two rules in the main routing table with "default route" disabled in the WAN client (PPPoE etc), like this:
Code: Select all
add check-gateway=ping comment="Default Route for ISP1 (Recursive 1)" distance=1 gateway=8.8.8.8
add check-gateway=ping comment="Default Route for ISP2 (Recursive 1)" distance=2 gateway=1.1.1.1

#Then create a separate route for each of the "gateway" themselves, in my case using the PPP profile hack

Well. This has the problem that the "canary" hosts (8.8.8.8 and 1.1.1.1 in the example) are not reachable for clients if the corresponding gateway goes down. That's the entire reason for the shenanigans with multiple routing tables. If you can live with that effect, then yes, obviously, things can be simpler.

Cheers,
Toby.

tdussa · Sat Jan 09, 2021 8:04 pm

Hi again,

also... A different question here...

#Then create a separate route for each of the "gateway" themselves, in my case using the PPP profile hack

I just realized I don't understand this hack. How do you "copy" a PPPoE profile?

Cheers,
Toby.

DarkNate · Sat Jan 09, 2021 9:04 pm

Hi again,

also... A different question here...
Code: Select all
#Then create a separate route for each of the "gateway" themselves, in my case using the PPP profile hack
I just realized I don't understand this hack. How do you "copy" a PPPoE profile?

Cheers,
Toby.

viewtopic.php?p=814682#p814682

PPP>Profile, create new.

tdussa · Sat Jan 09, 2021 9:18 pm

I just realized I don't understand this hack. How do you "copy" a PPPoE profile?
viewtopic.php?p=814682#p814682

PPP>Profile, create new.

Yes, I had read that post, but I still don't understand how to "copy" the "PPP profile" used for my PPPoE connection. I can create a new PPP profile alright, but then what?

Cheers,
Toby.

Chupaka · Sat Jan 09, 2021 11:10 pm

And then you set remote-address to the host you want to check. On connection establishment, a route to the remote-address will be automagically added to the 'main' routing table.

tdussa · Sat Jan 09, 2021 11:18 pm

Hi,

And then you set remote-address to the host you want to check. On connection establishment, a route to the remote-address will be automagically added to the 'main' routing table.

O_o I suspect I still don't understand. So you are saying that, for instance, if I use 9.9.9.9 as the canary host, then this:

/ppp profile
add local-address=127.1.1.1 name=Primary_A remote-address=9.9.9.9

should produce a route automagically? Because on my system it doesn't. How would the router be supposed to know what interface to use?

Cheers,
Toby.

Chupaka · Sat Jan 09, 2021 11:29 pm

It will use your PPPoE Interface. You should select your new profile in PPPoE Client properties.

P.S. Not sure if setting local-address won't break anything.

tdussa · Sat Jan 09, 2021 11:33 pm

Hi there,

It will use your PPPoE Interface. You should select your new profile in PPPoE Client properties.

Ah! That's the information I am looking for. So you are saying I should tweak the existing PPPoE config to use the new profile (so as to override the automagically-assigned remote address)?

P.S. Not sure if setting local-address won't break anything.

So you're saying just to leave the local-address setting empty?

Cheers,
Toby.

tdussa · Sat Jan 09, 2021 11:45 pm

Hi again,

It will use your PPPoE Interface. You should select your new profile in PPPoE Client properties.
Ah! That's the information I am looking for. So you are saying I should tweak the existing PPPoE config to use the new profile (so as to override the automagically-assigned remote address)?

P.S. Not sure if setting local-address won't break anything.
So you're saying just to leave the local-address setting empty?

Alright, so this seems to work:

/ppp profile
add name=Primary_A remote-address=9.9.9.9

/interface pppoe-client
add comment="Uplink" disabled=no interface=vlan-up-fiber keepalive-timeout=disabled name=pppoe-uplink password=*removed* profile=Primary_A user=*removed*

However, the thread above suggests that multiple canary hosts should be able to be configured as well with this, but how? The posts above always seem to leave out the interesting bits of information somehow.

Cheers,
Toby.

tdussa · Sun Jan 10, 2021 1:10 am

Aaand hello again,

I have been able to piece the puzzle together.

So if I understand correctly (and at the very least, things seem to work on my router with this setup), this is what needs to be done to be able to have multiple canary hosts AND PPPoE. I'll use 1.1.1.1 and 9.9.9.9 as canaries here, as above. The general routing is as in my MWE above, but we'll assume that the primary uplink is done via a PPPoE link. What's different then is that you need a fake static remote address that is stable across reconnects of the PPPoE connection; we'll use 127.1.1.1 in this example. This can be done with a PPP profile:

/ppp profile
add name=pppoe-static-profile remote-address=127.1.1.1

And we need to tell our PPPoE client to use this profile as well:

/interface pppoe-client
add comment="Primary uplink" disabled=no interface=uplink keepalive-timeout=disabled name=pppoe-uplink password=*redacted* profile=pppoe-static-profile user=*redacted*

This then means that the IP address 127.1.1.1 is available all the time as a gateway address, so we can just use that (instead of the 192.168.1.254 we used in the MWE above):

add comment="Primary route" distance=1 dst-address=1.1.1.1/32 gateway=127.1.1.1 scope=10

Neat.

Thanks for your help!

Cheers,
Toby.

Just for the record again, here's what I believe to be my complete MWE with two uplinks (primary using PPPoE via ether1, secondary using DHCP via ether2), two canaries (1.1.1.1 and 9.9.9.9), and high availability for the canary hosts for clients attached via bridge0, including a rudimentary update script for the secondary uplink. The direct route for the secondary uplink is initially set to something bogus but will be overwritten once the DHCP lease on the secondary uplink is acquired:

# Static routes
/ip route
add gateway=1.1.1.1 distance=1 check-gateway=ping routing-mark=HA comment="Primary virtual route A (HA)"
add gateway=9.9.9.9 distance=1 check-gateway=ping routing-mark=HA comment="Primary virtual route B (HA)"
add gateway=127.1.1.2 distance=2 routing-mark=HA comment="Secondary virtual route (HA)"

add gateway=1.1.1.1 distance=1 check-gateway=ping comment="Primary virtual route A"
add gateway=9.9.9.9 distance=1 check-gateway=ping comment="Primary virtual route B"
add gateway=127.1.1.2 distance=2 comment="Secondary virtual route"

add dst-address=1.1.1.1/32 gateway=127.1.1.1 scope=10 comment="Primary route A"
add dst-address=9.9.9.9/32 gateway=127.1.1.1 scope=10 comment="Primary route B"
add dst-address=127.1.1.2/32 gateway=127.1.1.2 scope=10 comment="Secondary route"

# Clients attached via bridge0 get to use the HA routing table
/ip route rule
add interface=bridge0 table=HA

# Primary uplink via PPPoE
/interface pppoe-client
add comment="Primary uplink" interface=ether1 keepalive-timeout=disabled name=pppoe-primary password=*password* profile=pppoe-static-profile user=*user*
/ppp profile
add name=pppoe-static-profile remote-address=127.1.1.1

# Secondary uplink via DHCP
/ip dhcp-client
add add-default-route=no disabled=no interface=ether2 script=\
    "# Update secondary route\
    \n:if (\$bound=1) do={\
    \n  /ip route set [/ip route find where gateway!=\$\"gateway-address\" and comment~\"Secondary route\"] gateway=\$\"gateway-address\"\
    \n}" use-peer-dns=no use-peer-ntp=no

DarkNate · Sun Jan 10, 2021 11:22 am

Aaand hello again,

I have been able to piece the puzzle together.

So if I understand correctly (and at the very least, things seem to work on my router with this setup), this is what needs to be done to be able to have multiple canary hosts AND PPPoE. I'll use 1.1.1.1 and 9.9.9.9 as canaries here, as above. The general routing is as in my MWE above, but we'll assume that the primary uplink is done via a PPPoE link. What's different then is that you need a fake static remote address that is stable across reconnects of the PPPoE connection; we'll use 127.1.1.1 in this example. This can be done with a PPP profile:
Code: Select all
/ppp profile
add name=pppoe-static-profile remote-address=127.1.1.1
And we need to tell our PPPoE client to use this profile as well:
Code: Select all
/interface pppoe-client
add comment="Primary uplink" disabled=no interface=uplink keepalive-timeout=disabled name=pppoe-uplink password=*redacted* profile=pppoe-static-profile user=*redacted*
This then means that the IP address 127.1.1.1 is available all the time as a gateway address, so we can just use that (instead of the 192.168.1.254 we used in the MWE above):
Code: Select all
add comment="Primary route" distance=1 dst-address=1.1.1.1/32 gateway=127.1.1.1 scope=10
Neat.

Thanks for your help!

Cheers,
Toby.

Just for the record again, here's what I believe to be my complete MWE with two uplinks (primary using PPPoE via ether1, secondary using DHCP via ether2), two canaries (1.1.1.1 and 9.9.9.9), and high availability for the canary hosts for clients attached via bridge0, including a rudimentary update script for the secondary uplink. The direct route for the secondary uplink is initially set to something bogus but will be overwritten once the DHCP lease on the secondary uplink is acquired:
Code: Select all
# Static routes
/ip route
add gateway=1.1.1.1 distance=1 check-gateway=ping routing-mark=HA comment="Primary virtual route A (HA)"
add gateway=9.9.9.9 distance=1 check-gateway=ping routing-mark=HA comment="Primary virtual route B (HA)"
add gateway=127.1.1.2 distance=2 routing-mark=HA comment="Secondary virtual route (HA)"

add gateway=1.1.1.1 distance=1 check-gateway=ping comment="Primary virtual route A"
add gateway=9.9.9.9 distance=1 check-gateway=ping comment="Primary virtual route B"
add gateway=127.1.1.2 distance=2 comment="Secondary virtual route"

add dst-address=1.1.1.1/32 gateway=127.1.1.1 scope=10 comment="Primary route A"
add dst-address=9.9.9.9/32 gateway=127.1.1.1 scope=10 comment="Primary route B"
add dst-address=127.1.1.2/32 gateway=127.1.1.2 scope=10 comment="Secondary route"

# Clients attached via bridge0 get to use the HA routing table
/ip route rule
add interface=bridge0 table=HA

# Primary uplink via PPPoE
/interface pppoe-client
add comment="Primary uplink" interface=ether1 keepalive-timeout=disabled name=pppoe-primary password=*password* profile=pppoe-static-profile user=*user*
/ppp profile
add name=pppoe-static-profile remote-address=127.1.1.1

# Secondary uplink via DHCP
/ip dhcp-client
add add-default-route=no disabled=no interface=ether2 script=\
    "# Update secondary route\
    \n:if (\$bound=1) do={\
    \n  /ip route set [/ip route find where gateway!=\$\"gateway-address\" and comment~\"Secondary route\"] gateway=\$\"gateway-address\"\
    \n}" use-peer-dns=no use-peer-ntp=no

So I did something like this with multiple hosts:

PPP Profile 1 for ISP1: Remote Address: 127.0.0.2
PPP Profile 2 for ISP2: Remote Address: 127.0.0.3

Finally:

add comment="Route for reaching ISP1's Recursive 1" distance=1 dst-address=8.8.8.8/32 gateway=127.0.0.2 scope=10
add comment="Route for reaching ISP1's Recursive 2" distance=1 dst-address=1.0.0.1/32 gateway=127.0.0.2 scope=10

add comment="Route for reaching ISP2's Recursive 1" distance=1 dst-address=1.1.1.1/32 gateway=127.0.0.3 scope=10
add comment="Route for reaching ISP2's Recursive 2" distance=1 dst-address=8.8.4.4/32 gateway=127.0.0.3 scope=10

add check-gateway=ping comment="Default Route for ISP1 (Recursive 1)" distance=1 gateway=8.8.8.8
add check-gateway=ping comment="Default Route for ISP1 (Recursive 2)" distance=2 gateway=1.0.0.1
add check-gateway=ping comment="Default Route for ISP2 (Recursive 1)" distance=3 gateway=1.1.1.1
add check-gateway=ping comment="Default Route for ISP2 (Recursive 2)" distance=4 gateway=8.8.4.4

The objective is to keep it simple to achieve efficiency/simplicity and reduce chances of failure/issues later on if you decided to add more complicated policy routing/load-balancing, whatever.

tdussa · Sun Jan 10, 2021 12:06 pm

Hi,

So I did something like this with multiple hosts:
PPP Profile 1 for ISP1: Remote Address: 127.0.0.2
PPP Profile 2 for ISP2: Remote Address: 127.0.0.3

Finally:

add comment="Route for reaching ISP1's Recursive 1" distance=1 dst-address=8.8.8.8/32 gateway=127.0.0.2 scope=10
add comment="Route for reaching ISP1's Recursive 2" distance=1 dst-address=1.0.0.1/32 gateway=127.0.0.2 scope=10
add comment="Route for reaching ISP2's Recursive 1" distance=1 dst-address=1.1.1.1/32 gateway=127.0.0.3 scope=10
add comment="Route for reaching ISP2's Recursive 2" distance=1 dst-address=8.8.4.4/32 gateway=127.0.0.3 scope=10

add check-gateway=ping comment="Default Route for ISP1 (Recursive 1)" distance=1 gateway=8.8.8.8
add check-gateway=ping comment="Default Route for ISP1 (Recursive 2)" distance=2 gateway=1.0.0.1
add check-gateway=ping comment="Default Route for ISP2 (Recursive 1)" distance=3 gateway=1.1.1.1
add check-gateway=ping comment="Default Route for ISP2 (Recursive 2)" distance=4 gateway=8.8.4.4

Well, that's exactly what I have done, isn't it? Except in my MWE ISP2 does not use PPPoE and thus has a DHCP startup script.

The objective is to keep it simple to achieve efficiency/simplicity and reduce chances of failure/issues later on if you decided to add more complicated policy routing/load-balancing, whatever.

Obviously. But again, what you are doing is not equivalent to what I am doing, is it? I totally agree that this is all a delicate trade-off balance. In your solution, if ISP1 fails, attached clients have no way to reach 8.8.8.8 or 1.0.0.1. This is avoided in my solution at the price of slightly more complex routing. Obviously, whether the trade-off is worth it is something that everybody has to decide individually (and, in fact, I am not certain myself, but I wanted do get it to work).

Arguably, your code is more complex than really necessary (provided that ISP 2 is just a backup link anyway and not meant to do load-balancing, as I have assumed and explicitly stated in my example). In that case, I don't see any benefit to be gained from actively checking whether the ISP 2 uplink is actually up and running because there is nothing you can do about it anyway, so you could just as well just leave out all the checking via 8.8.4.4 and 1.1.1.1. That would simplify your setup as well.

(As a side note, my MWE is somewhat longer because I actually included all relevant configuration aspects, not just parts. These missing but crucial config lines cost me the the better part of two days to figure out. I think it is much better to include everything that is necessary to actually reproduce something and, also, state assumptions explicitly.)

Cheers,
Toby.

DarkNate · Sun Jan 10, 2021 3:05 pm

Obviously. But again, what you are doing is not equivalent to what I am doing, is it? I totally agree that this is all a delicate trade-off balance. In your solution, if ISP1 fails, attached clients have no way to reach 8.8.8.8 or 1.0.0.1. This is avoided in my solution at the price of slightly more complex routing. Obviously, whether the trade-off is worth it is something that everybody has to decide individually (and, in fact, I am not certain myself, but I wanted do get it to work).

Wait, what? If ISP1 fails, all clients can still reach 8.8.8.8 & 1.0.0.1 via ISP2. The routing table will automatically drop those dead ISP1 routes including the custom route for 8.8.8.8 & 1.0.0.1 which is routed via ISP1's interface, the routing table would fall back to the "main" routing table which now still has backup recursive routes for ISP2 and hence re-direct traffic through it including traffic destined towards 8.8.8.8 & 1.0.0.1. Don't believe me? Use my config, ping the IP with -n flag, kill ISP1, see how long it takes for RouterOS to re-route to ISP2, it would be well below 1ms.

Arguably, your code is more complex than really necessary (provided that ISP 2 is just a backup link anyway and not meant to do load-balancing, as I have assumed and explicitly stated in my example). In that case, I don't see any benefit to be gained from actively checking whether the ISP 2 uplink is actually up and running because there is nothing you can do about it anyway, so you could just as well just leave out all the checking via 8.8.4.4 and 1.1.1.1. That would simplify your setup as well.

You clearly asked for "recursive routing failover without load balancing", that is what I based my solution on, what makes you think I don't load balance? I have a fairly complex policy routing setup for load balancing with a combination of Nth and PCC while accounting for HTTPS traffic via TCP and QUIC protocol as to ensure they don't break with having multiple source IPs such as banking sites/PayPal etc while still ensuring all multi-threaded and SCTP traffic are able to achieve bandwidth aggregation through both ISPs simultaneously.

tdussa · Sun Jan 10, 2021 10:41 pm

Hi there,

Wait, what? If ISP1 fails, all clients can still reach 8.8.8.8 & 1.0.0.1 via ISP2. The routing table will automatically drop those dead ISP1 routes including the custom route for 8.8.8.8 & 1.0.0.1 which is routed via ISP1's interface, the routing table would fall back to the "main" routing table which now still has backup recursive routes for ISP2 and hence re-direct traffic through it including traffic destined towards 8.8.8.8 & 1.0.0.1. Don't believe me? Use my config, ping the IP with -n flag, kill ISP1, see how long it takes for RouterOS to re-route to ISP2, it would be well below 1ms.

I cannot test this, to be honest, because I have no way to make my ISP drop traffic at their router, but does the pppoe/ppp interface behave differently from a plain ethernet interface in this regard? I mean, obviously, if the pppoe connection fails, then I would expect the route to disappear, but what I was talking about was what happens if the direct connection to ISP1 (i. e., the pppoe connection) is still valid (and the direct gateway of ISP1 reachable), but then packets are dropped somewhere farther upstream at ISP1. Then, of course, the canary hosts will not be pingable, so the corresponding recursive routes will be disabled. But will the direct routes also disappear? For a regular ether interface, those routes will stay and effectively blackhole the canary hosts. This is discussed in viewtopic.php?p=825704#p825539 and the following posts as well.

Arguably, your code is more complex than really necessary (provided that ISP 2 is just a backup link anyway and not meant to do load-balancing, as I have assumed and explicitly stated in my example). In that case, I don't see any benefit to be gained from actively checking whether the ISP 2 uplink is actually up and running because there is nothing you can do about it anyway, so you could just as well just leave out all the checking via 8.8.4.4 and 1.1.1.1. That would simplify your setup as well.
You clearly asked for "recursive routing failover without load balancing", that is what I based my solution on, what makes you think I don't load balance? I have a fairly complex policy routing setup for load balancing with a combination of Nth and PCC while accounting for HTTPS traffic via TCP and QUIC protocol as to ensure they don't break with having multiple source IPs such as banking sites/PayPal etc while still ensuring all multi-threaded and SCTP traffic are able to achieve bandwidth aggregation through both ISPs simultaneously.

Well, no. I disagree. Without load balancing, it makes no sense to monitor the backup ISP uplink, so your solution can be simplified because the monitoring of the backup uplink can be dropped for all practical intents and purposes. I never said that you don't use failover in your setup, so your code might be totally appropriate (and minimal) for your use case, but for my use case, I still think it is simplifiable.

Cheers,
Toby.

DarkNate · Mon Jan 11, 2021 3:00 pm

Hi there,

Wait, what? If ISP1 fails, all clients can still reach 8.8.8.8 & 1.0.0.1 via ISP2. The routing table will automatically drop those dead ISP1 routes including the custom route for 8.8.8.8 & 1.0.0.1 which is routed via ISP1's interface, the routing table would fall back to the "main" routing table which now still has backup recursive routes for ISP2 and hence re-direct traffic through it including traffic destined towards 8.8.8.8 & 1.0.0.1. Don't believe me? Use my config, ping the IP with -n flag, kill ISP1, see how long it takes for RouterOS to re-route to ISP2, it would be well below 1ms.
I cannot test this, to be honest, because I have no way to make my ISP drop traffic at their router, but does the pppoe/ppp interface behave differently from a plain ethernet interface in this regard? I mean, obviously, if the pppoe connection fails, then I would expect the route to disappear, but what I was talking about was what happens if the direct connection to ISP1 (i. e., the pppoe connection) is still valid (and the direct gateway of ISP1 reachable), but then packets are dropped somewhere farther upstream at ISP1. Then, of course, the canary hosts will not be pingable, so the corresponding recursive routes will be disabled. But will the direct routes also disappear? For a regular ether interface, those routes will stay and effectively blackhole the canary hosts. This is discussed in viewtopic.php?p=825704#p825539 and the following posts as well.

Arguably, your code is more complex than really necessary (provided that ISP 2 is just a backup link anyway and not meant to do load-balancing, as I have assumed and explicitly stated in my example). In that case, I don't see any benefit to be gained from actively checking whether the ISP 2 uplink is actually up and running because there is nothing you can do about it anyway, so you could just as well just leave out all the checking via 8.8.4.4 and 1.1.1.1. That would simplify your setup as well.
You clearly asked for "recursive routing failover without load balancing", that is what I based my solution on, what makes you think I don't load balance? I have a fairly complex policy routing setup for load balancing with a combination of Nth and PCC while accounting for HTTPS traffic via TCP and QUIC protocol as to ensure they don't break with having multiple source IPs such as banking sites/PayPal etc while still ensuring all multi-threaded and SCTP traffic are able to achieve bandwidth aggregation through both ISPs simultaneously.
Well, no. I disagree. Without load balancing, it makes no sense to monitor the backup ISP uplink, so your solution can be simplified because the monitoring of the backup uplink can be dropped for all practical intents and purposes. I never said that you don't use failover in your setup, so your code might be totally appropriate (and minimal) for your use case, but for my use case, I still think it is simplifiable.

Cheers,
Toby.

The whole point of @Chupaka making this thread/guide was to bypass ISP gateway completely. My setup and his setup relies on external hosts, which if unreachable means dead ISP gateway.

Whatever man, enjoy your setup. Thanks to @Chupaka for this guide and thanks for your suggestion of using a null-address on the "remote address" field on the PPP profile to enable N number of hosts for the recursive routing failover.

tdussa · Mon Jan 11, 2021 4:28 pm

Hi,

I cannot test this, to be honest, because I have no way to make my ISP drop traffic at their router, but does the pppoe/ppp interface behave differently from a plain ethernet interface in this regard? I mean, obviously, if the pppoe connection fails, then I would expect the route to disappear, but what I was talking about was what happens if the direct connection to ISP1 (i. e., the pppoe connection) is still valid (and the direct gateway of ISP1 reachable), but then packets are dropped somewhere farther upstream at ISP1. Then, of course, the canary hosts will not be pingable, so the corresponding recursive routes will be disabled. But will the direct routes also disappear? For a regular ether interface, those routes will stay and effectively blackhole the canary hosts. This is discussed in viewtopic.php?p=825704#p825539 and the following posts as well.
The whole point of @Chupaka making this thread/guide was to bypass ISP gateway completely. My setup and his setup relies on external hosts, which if unreachable means dead ISP gateway.

What? No, of course the ISP gateway cannot be bypassed. Originally, this thread was about having a check for routing failures that are further upstream than the direct link without having to resort to scripting. But this method of checking also means that unless you do some routing sub-table voodoo, the canary hosts used for checking are unreachable if the corresponding uplink fails. That's all I'm saying (and, again, that's discussed and mitigated in the posts I have linked to).

Whatever man, enjoy your setup. Thanks to @Chupaka for this guide and thanks for your suggestion of using a null-address on the "remote address" field on the PPP profile to enable N number of hosts for the recursive routing failover.

Indeed, thanks to all the contributors to this discussion! :)

Cheers,
Toby.

haris013 · Sat Jan 23, 2021 12:14 am

Hello, I have 2 wan with static private IP (192.168.1.100 wan 1 and 192.168.0.100 wan 2)

I am using the first's post routing rules but i have a strange problem, the Cloud DDNS does not update when these rules are applied. The balancing and failover seems to work fine. I cannot figure why my router can't reach mikrotik's cloud and update my public address.

Do i need to configure anything else?

these are my routes

add check-gateway=ping distance=1 gateway=8.8.8.8 routing-mark=to_ISP1
add check-gateway=ping distance=2 gateway=8.8.4.4 routing-mark=to_ISP1
add check-gateway=ping distance=1 gateway=8.8.4.4 routing-mark=to_ISP2
add check-gateway=ping distance=2 gateway=8.8.8.8 routing-mark=to_ISP2
add distance=1 dst-address=8.8.4.4/32 gateway=192.168.0.1 scope=10
add distance=20 dst-address=8.8.4.4/32 type=blackhole
add distance=1 dst-address=8.8.8.8/32 gateway=192.168.1.1 scope=10
add distance=20 dst-address=8.8.8.8/32 type=blackhole

and these are my mangle rules

/ip firewall mangle
add action=accept chain=prerouting dst-address=192.168.1.0/24 in-interface=\
    bridge
add action=accept chain=prerouting dst-address=192.168.0.0/24 in-interface=\
    bridge
add action=mark-connection chain=prerouting connection-mark=no-mark \
    in-interface=ether1 new-connection-mark=ISP1_conn passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark \
    in-interface=ether2 new-connection-mark=ISP2_conn passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark \
    dst-address-type=!local in-interface=bridge new-connection-mark=ISP1_conn \
    passthrough=yes per-connection-classifier=both-addresses:2/0
add action=mark-connection chain=prerouting connection-mark=no-mark \
    dst-address-type=!local in-interface=bridge new-connection-mark=ISP2_conn \
    passthrough=yes per-connection-classifier=both-addresses:2/1
add action=mark-routing chain=prerouting connection-mark=ISP1_conn \
    in-interface=bridge new-routing-mark=to_ISP1 passthrough=yes
add action=mark-routing chain=prerouting connection-mark=ISP2_conn \
    in-interface=bridge new-routing-mark=to_ISP2 passthrough=yes
add action=mark-routing chain=output connection-mark=ISP1_conn \
    new-routing-mark=to_ISP1 passthrough=yes
add action=mark-routing chain=output connection-mark=ISP2_conn \
    new-routing-mark=to_ISP2 passthrough=yes

Do I have to set anything else for a proper loadbalancing with failover? The wan connections are 2 ADSL lines (I don't have the option for PPPoE setup due to ISP limitation at their modem/router)

Is there anything wrong with the above setup?

Also how I can figure out the weight ratio of the above setup? How are connections distribute between the 2 WANs ? How i can "push" more connections at WAN 1 for example?

DarkNate · Sat Jan 23, 2021 9:42 am

You're in a double NAT situation. Ask the ISP to bridge the CPE. Then establish PPPoE at the router level. That's is the right way to do it.

Double NAT will create all sorts of weird issues for obvious reasons.

chatravin · Sat Jan 23, 2021 9:59 am

as far as I among other folks on this topic have checked recursive route for the purpose of failover does not come along with PCC load balancing if two or more route marks are being used.

DarkNate · Sat Jan 23, 2021 1:09 pm

as far as I among other folks on this topic have checked recursive route for the purpose of failover does not come along with PCC load balancing if two or more route marks are being used.

I use PCC+Nth mangle load balancing + recursive routes just fine. I've shared the config in previous posts.

haris013 · Sat Jan 23, 2021 8:27 pm

You're in a double NAT situation. Ask the ISP to bridge the CPE. Then establish PPPoE at the router level. That's is the right way to do it.

Double NAT will create all sorts of weird issues for obvious reasons.

I know i am at dual nat state. From the ISP side is not possible for a pppoe connection. Also the CPE manages the Voip telephony so my only option is connecting with just an IP.

Is there any way to manage or handle the dual nat problems?

btw how does the above config distributes the connections? Is it round robin LB? is there any way to create "weight" ratios between theese 2 WAN?

DarkNate · Mon Jan 25, 2021 8:31 am

I know i am at dual nat state. From the ISP side is not possible for a pppoe connection. Also the CPE manages the Voip telephony so my only option is connecting with just an IP.

Is there any way to manage or handle the dual nat problems?

btw how does the above config distributes the connections? Is it round robin LB? is there any way to create "weight" ratios between theese 2 WAN?

Send an email to your ISP's ASN's NOC team, that's their L3 and above layer team, ask them to enable bridge mode functionality for you, they will do it. The VoIP+VLAN tagging (if any) can easily be replicated on RouterOS once your ISP releases any L2.5 MAC binding and create fresh ones for RouterOS.

Never work with double NAT shit, chances are you're in a triple NAT, one CGNAT at ISP level, one NAT at CPE, one NAT at RouterOS level.

MikroTik's load balancing is complex and flexible. You can use PCC to weight distribute or even Nth. Many possible combos. I use PCC for 80/443 traffic and Nth for all other ports. The end result of my method? You will get aggregated bandwidth from both ISPs for multi-threaded traffic or SCTP traffic (which works in RouterOS by default).

I have looked at Cisco, Juniper, VyOS, pfSense etc. No vendor apart from MikroTik offers L3/L4 load balancing combo features.

Chupaka · Wed Jan 27, 2021 2:58 pm

I cannot figure why my router can't reach mikrotik's cloud and update my public address.

these are my routes

I don't see 0.0.0.0/0 route in your 'main' table (the one that is used by the router's processes like Cloud for initial route lookup) - that can be the reason.

haris013 · Thu Jan 28, 2021 4:04 pm

I cannot figure why my router can't reach mikrotik's cloud and update my public address.

these are my routes

I don't see 0.0.0.0/0 route in your 'main' table (the one that is used by the router's processes like Cloud for initial route lookup) - that can be the reason.

can you guide me specifically what route should i add in order to test it? 0.0.0.0/0 at which gateway?

thanks

Chupaka · Thu Jan 28, 2021 4:46 pm

0.0.0.0/0 via 8.8.8.8 and 8.8.4.4 (the same you already have but without routing marks) should be good enough

haris013 · Wed Feb 17, 2021 10:09 am

I have a strange problem, I want to distrubute more traffic at WAN2, I have created one more PCC rule

So we have 3 PCC rules, 2/0 ISP1, 2/1 ISP2, 2/2 ISP2, but it seems there is no traffic at my last rule 2/2 ISP2. Also connections seems to be distributed evenly 50-50 and there are not more connection at ISP2.

Is something wrong with the setup?

these are the connections and the routes:

Chupaka · Wed Feb 17, 2021 9:46 pm

Yep, something is wrong with the setup. Your three rules should be 3/0, 3/1 and 3/2 instead of 2/*.

haris013 · Wed Feb 17, 2021 10:36 pm

can you explain me why 3? I though after several readings that the first part of PCC is the number of the WANs, 2 means 2 wan, is that wrong? Please help me to understand better the classifier algorithm.

Do I need more or other mangle rules? Is there any document to read more about PCC loadbalancing?

If i have a setup with 3 wan, the pcc will be 4/0, 4/1, 4/2, 4/3 ?

thanks!

Chupaka · Wed Feb 17, 2021 11:41 pm

PCC divides all connections into groups, it has nothing to do with WANs. In your case, you need 3 groups, one of them to be sent to WAN1 and another two - to WAN2 (so that it gets more traffic).

f you have 3 WANs and want traffic equally distributed, you still use 3/0, 3/1 and 3/2.

If you need to send 20% to WAN1, 30% to WAN2 and 50% to WAN3 - you need 10 groups:

10/0, 10/1 -> WAN1
10/2, 10/3, 10/4 -> WAN2
10/5, 10/6, 10/7, 10/8, 10/9 -> WAN3

haris013 · Thu Feb 18, 2021 2:24 am

PCC divides all connections into groups, it has nothing to do with WANs. In your case, you need 3 groups, one of them to be sent to WAN1 and another two - to WAN2 (so that it gets more traffic).

f you have 3 WANs and want traffic equally distributed, you still use 3/0, 3/1 and 3/2.

If you need to send 20% to WAN1, 30% to WAN2 and 50% to WAN3 - you need 10 groups:

10/0, 10/1 -> WAN1
10/2, 10/3, 10/4 -> WAN2
10/5, 10/6, 10/7, 10/8, 10/9 -> WAN3

You are legend! thank you very much for the info!

anav · Tue Mar 09, 2021 3:37 pm

Hi folks, I am very much interested in this tidbit of script.
:if ($bound=1) do={
:local iface $interface
:local gw [ /ip dhcp-client get [ find interface=$"iface" ] gateway ]
/ip route set [ find comment="GW1_IP" gateway!=$gw ] gateway=$gw
}

My issue is not the recursive bits, works fine for sometime, however whenever my ISP gives me a new gateway for whatever reason, power on/off etc.
The recursive route gateway is NOT updated automatically and I have to do it manually and would like the dhcp client script to do this.
I have tried two scripts thus far that failed.

THe first one I got from somewhere but quite frankly don't understand it at all, too elegant for me!
(1) script="\":if (\\\$bound=1) do={ /ip route set [find commen\\\r\
\nt=\\\"BellFibre\\\"] gateway=(\\\$\\\"gateway-address\\\") disabled=no; :lo\
g warning\\\r\
\n\\_(\\\"New ISP1 gateway: \\\".(\\\$\\\"gateway-address\\\")) }\"" \

The second one looks more like yours but didn't work either (tested by changing recursive gateway IPs to wrong ones, going to dhcp client hit release, renew)
(2) :if (\$bound=1) do={
:local newgw $"gateway-address";
:local routegw [/ip route get [find comment="PrimaryRecursive"] gateway ];
:if ($newgw != $routegw) do={
/ip route set [find comment="PrimaryRecursive"] gateway=$newgw;
/ip route set [find comment="SecondaryRecursive"] gateway=$newgw;
/tool e-mail send to="myaddress@email.ca" subject=([/system identity
get name]) body=" This is your new gateway IP: $newIgw" start-tls=yes;
}
}

Any pointers appreciated, and will use whatever script works LOL, However do need to substitute for both recursive routes)
:if ($bound=1) do={
:local iface $interface
:local gw [ /ip dhcp-client get [ find interface=$"iface" ] gateway ]
/ip route set [ find comment="GW1_IP" gateway!=$gw ] gateway=$gw
/ip route set [ find comment="GW2_IP" gateway!=$gw ] gateway=$gw
}

anav · Tue Mar 09, 2021 5:23 pm

This script works....... :-)

if ($bound=1) do={
:local iface $interface
:local gw [ /ip dhcp-client get [ find interface=$"iface" ] gateway ]
/ip route set [ find comment="PrimaryRecursive" gateway!=$gw ] gateway=$gw
/ip route set [ find comment="SecondaryRecursive" gateway!=$gw ] gateway=$gw
}

mzzzn · Sat Mar 13, 2021 10:40 am

Hi - thanks for all the info, but I'm still unclear on how to merge static/dhcp for failover.

Background / goal:

wan1: DHCP, 100mb/s; ISP changes address every few days even w/o link going down

wan2: static, 1mb/s - failure use only

masquerade on

No load balancing needed (or even wanted with such disparate links).

A bit of hysteresis would be good (eg: 4 pings in a row would fail before failover).

Usually the links themselves do not fail, but some place further upstream - which is why I've been trying the recursive routing methods listed above (and in other threads).

The PPP methods look like they should work for the DHCP link, but do not seem to.

Thanks!

epedersen · Fri Apr 16, 2021 3:07 am

Hi everyone!

Great post... I tried this config on my setup and in most part it's working but i'm facing the named interface next-hop issue.
My Internet backup is using a onboard LTE interface, anyone knows a workaround to this?

Thanks!

epedersen · Fri Apr 16, 2021 8:57 pm

Hi everyone,

I'm sharing my workaround to add LTE to the Failover, is not so fancy but it's functional.
In my case my primary Internet connection is through a cable modem that it's bridged to my ethe1 port, and my secondary Internet connection is through and internal LTE interface.

In my cable modem connection I don't have static IP address so I have to configure dhcp-client script in order to install the rigth routes to make the recursive route feasible...

/
ip dhcp-client
add add-default-route=no disabled=no interface=WAN1 script="{\r\
    \n  :local count [/ip route print count-only where comment=\"IP 1/2 used to monitor WAN1\"]\r\
    \n  :if (\$bound=1) do={\r\
    \n    :if (\$count = 0) do={\r\
    \n      /ip route add distance=1 dst-address=8.8.8.8/32 gateway=\$\"gateway-address\" scope=10 comment=\"IP 1/2 used to monitor WAN1\"\r\
    \n    } else={\r\
    \n      :if (\$count = 1) do={\r\
    \n        :local test [/ip route find where comment=\"IP 1/2 used to monitor WAN1\"]\r\
    \n        :if ([/ip route get \$test gateway] != \$\"gateway-address\") do={\r\
    \n          /ip route set \$test gateway=\$\"gateway-address\"\r\
    \n        }\r\
    \n      } else={\r\
    \n        :error \"Multiple routes found\"\r\
    \n      }\r\
    \n    }\r\
    \n  } else={\r\
    \n    /ip route remove [find comment=\"IP 1/2 used to monitor WAN1\"]\r\
    \n  }\r\
    \n}\r\
    \n{\r\
    \n  :local count [/ip route print count-only where comment=\"IP 2/2 used to monitor WAN1\"]\r\
    \n  :if (\$bound=1) do={\r\
    \n    :if (\$count = 0) do={\r\
    \n      /ip route add distance=1 dst-address=208.67.220.220/32 gateway=\$\"gateway-address\" scope=10 comment=\"IP 2/2 used to monitor WAN1\"\r\
    \n    } else={\r\
    \n      :if (\$count = 1) do={\r\
    \n        :local test [/ip route find where comment=\"IP 2/2 used to monitor WAN1\"]\r\
    \n        :if ([/ip route get \$test gateway] != \$\"gateway-address\") do={\r\
    \n          /ip route set \$test gateway=\$\"gateway-address\"\r\
    \n        }\r\
    \n      } else={\r\
    \n        :error \"Multiple routes found\"\r\
    \n      }\r\
    \n    }\r\
    \n  } else={\r\
    \n    /ip route remove [find comment=\"IP 2/2 used to monitor WAN1\"]\r\
    \n  }\r\
    \n}" use-peer-dns=no use-peer-ntp=no

The installed routes looks like this:

/ip route 
add comment="IP 1/2 used to monitor WAN1" distance=1 dst-address=8.8.8.8/32 gateway=X.X.X.X scope=10
add comment="IP 2/2 used to monitor WAN1" distance=1 dst-address=208.67.220.220/32 gateway=X.X.X.X scope=10

Then, I add the routes pointing to the virtual IP address and the new default through the virtual IP:

/ip route
 add dst-address=10.100.100.1 gateway=8.8.8.8 scope=10 check-gateway=ping
 add dst-address=10.100.100.1 gateway=208.67.220.220 scope=10 check-gateway=ping
 /ip route
 add distance=1 gateway=10.100.100.1

Finally, when my default is not reachable because 8.8.8.8 and 208.67.220.220 are unreachable my default change from 10.100.100.1 with metric 1 to the lte1 inteface default with metric 2:

 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 A S  0.0.0.0/0                          10.100.100.1              1
 1  DS  0.0.0.0/0                          lte1                      2

So far is working just fine, any thought on this setup?

batarang · Wed Jun 16, 2021 5:36 am

Adding here my working config for dual and triple DHCP WAN. Inspired by this thread.

https://gist.github.com/marfillaster/63 ... 8a8e5986fe

Dual

/interface bridge add name=bridge

/interface bridge port
add bridge=bridge interface=ether3
add bridge=bridge interface=ether2
add bridge=bridge interface=ether1

/interface list
add comment=defconf name=WAN
add comment=defconf name=LAN

/interface list member
add interface=bridge list=LAN
add interface=ether5 list=WAN
add interface=ether4 list=WAN


/interface detect-internet set internet-interface-list=WAN lan-interface-list=LAN wan-interface-list=WAN

/ip upnp
set enabled=yes
/ip upnp interfaces
add interface=bridge type=internal
add interface=ether5 type=external
add interface=ether4 type=external


/ip settings set allow-fast-path=no

/ip address add address=192.168.88.1/24 interface=bridge network=192.168.88.0

/ip firewall address-list add address=192.168.88.0/24 list=local

/ip firewall nat
add action=masquerade chain=srcnat ipsec-policy=out,none out-interface-list=WAN

/ip firewall mangle
add action=accept chain=prerouting comment="bridge access" dst-address-list=local in-interface=bridge
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=ether4 new-connection-mark=CONN2 passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=ether5 new-connection-mark=CONN1 passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-type=!local in-interface=bridge new-connection-mark=CONN1 passthrough=yes per-connection-classifier=both-addresses-and-ports:2/0
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-type=!local in-interface=bridge new-connection-mark=CONN2 passthrough=yes per-connection-classifier=both-addresses-and-ports:2/1
add action=mark-routing chain=prerouting connection-mark=CONN1 in-interface=bridge new-routing-mark=ISP1 passthrough=yes
add action=mark-routing chain=prerouting connection-mark=CONN2 in-interface=bridge new-routing-mark=ISP2 passthrough=yes
add action=mark-routing chain=output connection-mark=CONN1 new-routing-mark=ISP1 passthrough=yes
add action=mark-routing chain=output connection-mark=CONN2 new-routing-mark=ISP2 passthrough=yes


/routing filter
add chain=dynamic-in distance=33 set-distance=2 set-route-comment=ISP2 set-scope=10
add chain=dynamic-in distance=34 set-distance=3 set-route-comment=ISP1 set-scope=10

/ip dhcp-client
add default-route-distance=33 disabled=no interface=ether4 script="{\
    \n    :if (\$bound=1) do={\
    \n        /ip route set [/ip route find where comment=\"ISP2_VALIDATE\"] gateway=\$\"gateway-address\"\
    \n    } \
    \n    /ip firewall connection remove [find connection-mark=\"CONN1\"]\
    \n    /ip firewall connection remove [find connection-mark=\"CONN2\"]\
    \n}" use-peer-dns=no use-peer-ntp=no
add default-route-distance=32 disabled=no interface=ether5 script="{\
    \n    :if (\$bound=1) do={\
    \n        /ip route set [/ip route find where comment=\"ISP1_VALIDATE\"] gateway=\$\"gateway-address\"\
    \n    } \
    \n    /ip firewall connection remove [find connection-mark=\"CONN1\"]\
    \n    /ip firewall connection remove [find connection-mark=\"CONN2\"]\
    \n}" use-peer-dns=no use-peer-ntp=no

/ip route
add comment=ISP1_VALIDATE distance=1 dst-address=185.228.168.9/32 gateway=127.0.0.1 scope=10
add comment=ISP1_VALIDATE distance=1 dst-address=208.67.220.220/32 gateway=127.0.0.1 scope=10
add comment=ISP1_VALIDATE distance=1 dst-address=208.67.222.222/32 gateway=127.0.0.1 scope=10
add comment=ISP2_VALIDATE distance=1 dst-address=94.140.14.14/32 gateway=127.0.0.1 scope=10
add comment=ISP2_VALIDATE distance=1 dst-address=94.140.15.15/32 gateway=127.0.0.1 scope=10
add comment=ISP2_VALIDATE distance=1 dst-address=8.20.247.20/32 gateway=127.0.0.1 scope=10
add check-gateway=ping distance=1 dst-address=10.1.1.1/32 gateway=185.228.168.9 scope=10
add check-gateway=ping distance=1 dst-address=10.1.1.1/32 gateway=208.67.220.220 scope=10
add check-gateway=ping distance=1 dst-address=10.1.1.1/32 gateway=208.67.222.222 scope=10
add check-gateway=ping distance=1 dst-address=10.2.2.1/32 gateway=94.140.14.14 scope=10
add check-gateway=ping distance=1 dst-address=10.2.2.1/32 gateway=94.140.15.15 scope=10
add check-gateway=ping distance=1 dst-address=10.2.2.1/32 gateway=8.20.247.20 scope=10
add distance=1 gateway=10.1.1.1 routing-mark=ISP1 
add distance=2 gateway=10.2.2.1 routing-mark=ISP1
add distance=1 gateway=10.2.2.1 routing-mark=ISP2
add distance=2 gateway=10.1.1.1 routing-mark=ISP2
add distance=20 dst-address=185.228.168.9/32 type=blackhole
add distance=20 dst-address=208.67.220.220/32 type=blackhole
add distance=20 dst-address=208.67.222.222/32 type=blackhole
add distance=20 dst-address=94.140.14.14/32 type=blackhole
add distance=20 dst-address=94.140.15.15/32 type=blackhole
add distance=20 dst-address=8.20.247.20/32 type=blackhole

Triple

/interface bridge add name=bridge

/interface bridge port
add bridge=bridge interface=ether2
add bridge=bridge interface=ether1

/interface list
add comment=defconf name=WAN
add comment=defconf name=LAN

/interface list member
add interface=bridge list=LAN
add interface=ether5 list=WAN
add interface=ether4 list=WAN
add interface=ether3 list=WAN

/interface detect-internet set internet-interface-list=WAN lan-interface-list=LAN wan-interface-list=WAN

/ip upnp
set enabled=yes
/ip upnp interfaces
add interface=bridge type=internal
add interface=ether5 type=external
add interface=ether4 type=external
add interface=ether3 type=external

/ip settings set allow-fast-path=no

/ip address add address=192.168.88.1/24 interface=bridge network=192.168.88.0

/ip firewall address-list add address=192.168.88.0/24 list=local

/ip firewall nat
add action=masquerade chain=srcnat ipsec-policy=out,none out-interface-list=WAN

/ip firewall mangle
add action=accept chain=prerouting comment="bridge access" dst-address-list=local in-interface=bridge
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=ether3 new-connection-mark=CONN3 passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=ether4 new-connection-mark=CONN2 passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=ether5 new-connection-mark=CONN1 passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-type=!local in-interface=bridge new-connection-mark=CONN1 passthrough=yes per-connection-classifier=both-addresses-and-ports:3/0
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-type=!local in-interface=bridge new-connection-mark=CONN2 passthrough=yes per-connection-classifier=both-addresses-and-ports:3/1
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-type=!local in-interface=bridge new-connection-mark=CONN3 passthrough=yes per-connection-classifier=both-addresses-and-ports:3/2
add action=mark-routing chain=prerouting connection-mark=CONN1 in-interface=bridge new-routing-mark=ISP1 passthrough=yes
add action=mark-routing chain=prerouting connection-mark=CONN2 in-interface=bridge new-routing-mark=ISP2 passthrough=yes
add action=mark-routing chain=prerouting connection-mark=CONN3 in-interface=bridge new-routing-mark=ISP3 passthrough=yes
add action=mark-routing chain=output connection-mark=CONN1 new-routing-mark=ISP1 passthrough=yes
add action=mark-routing chain=output connection-mark=CONN2 new-routing-mark=ISP2 passthrough=yes
add action=mark-routing chain=output connection-mark=CONN3 new-routing-mark=ISP3 passthrough=yes

/routing filter
add chain=dynamic-in distance=32 set-distance=1 set-route-comment=ISP3 set-scope=10
add chain=dynamic-in distance=33 set-distance=2 set-route-comment=ISP2 set-scope=10
add chain=dynamic-in distance=34 set-distance=3 set-route-comment=ISP1 set-scope=10

/ip dhcp-client
add default-route-distance=34 disabled=no interface=ether3 script="{\
    \n    :if (\$bound=1) do={\
    \n        /ip route set [/ip route find where comment=\"ISP3_VALIDATE\"] gateway=\$\"gateway-address\"\
    \n    } \
    \n    /ip firewall connection remove [find connection-mark=\"CONN1\"]\
    \n    /ip firewall connection remove [find connection-mark=\"CONN2\"]\
    \n    /ip firewall connection remove [find connection-mark=\"CONN3\"]\
    \n}" use-peer-dns=no use-peer-ntp=no
add default-route-distance=33 disabled=no interface=ether4 script="{\
    \n    :if (\$bound=1) do={\
    \n        /ip route set [/ip route find where comment=\"ISP2_VALIDATE\"] gateway=\$\"gateway-address\"\
    \n    } \
    \n    /ip firewall connection remove [find connection-mark=\"CONN1\"]\
    \n    /ip firewall connection remove [find connection-mark=\"CONN2\"]\
    \n    /ip firewall connection remove [find connection-mark=\"CONN3\"]\
    \n}" use-peer-dns=no use-peer-ntp=no
add default-route-distance=32 disabled=no interface=ether5 script="{\
    \n    :if (\$bound=1) do={\
    \n        /ip route set [/ip route find where comment=\"ISP1_VALIDATE\"] gateway=\$\"gateway-address\"\
    \n    } \
    \n    /ip firewall connection remove [find connection-mark=\"CONN1\"]\
    \n    /ip firewall connection remove [find connection-mark=\"CONN2\"]\
    \n    /ip firewall connection remove [find connection-mark=\"CONN3\"]\
    \n}" use-peer-dns=no use-peer-ntp=no

/ip route
add comment=ISP1_VALIDATE distance=1 dst-address=185.228.168.9/32 gateway=127.0.0.1 scope=10
add comment=ISP1_VALIDATE distance=1 dst-address=208.67.220.220/32 gateway=127.0.0.1 scope=10
add comment=ISP1_VALIDATE distance=1 dst-address=208.67.222.222/32 gateway=127.0.0.1 scope=10
add comment=ISP2_VALIDATE distance=1 dst-address=94.140.14.14/32 gateway=127.0.0.1 scope=10
add comment=ISP2_VALIDATE distance=1 dst-address=94.140.15.15/32 gateway=127.0.0.1 scope=10
add comment=ISP2_VALIDATE distance=1 dst-address=8.20.247.20/32 gateway=127.0.0.1 scope=10
add comment=ISP3_VALIDATE distance=1 dst-address=9.9.9.9/32 gateway=127.0.0.1 scope=10
add comment=ISP3_VALIDATE distance=1 dst-address=9.9.9.10/32 gateway=127.0.0.1 scope=10
add comment=ISP3_VALIDATE distance=1 dst-address=8.26.56.26/32 gateway=127.0.0.1 scope=10
add check-gateway=ping distance=1 dst-address=10.1.1.1/32 gateway=185.228.168.9 scope=10
add check-gateway=ping distance=1 dst-address=10.1.1.1/32 gateway=208.67.220.220 scope=10
add check-gateway=ping distance=1 dst-address=10.1.1.1/32 gateway=208.67.222.222 scope=10
add check-gateway=ping distance=1 dst-address=10.2.2.1/32 gateway=94.140.14.14 scope=10
add check-gateway=ping distance=1 dst-address=10.2.2.1/32 gateway=94.140.15.15 scope=10
add check-gateway=ping distance=1 dst-address=10.2.2.1/32 gateway=8.20.247.20 scope=10
add check-gateway=ping distance=1 dst-address=10.3.3.1/32 gateway=9.9.9.9 scope=10
add check-gateway=ping distance=1 dst-address=10.3.3.1/32 gateway=9.9.9.10 scope=10
add check-gateway=ping distance=1 dst-address=10.3.3.1/32 gateway=8.26.56.26 scope=10
add distance=1 gateway=10.1.1.1 routing-mark=ISP1 
add distance=2 gateway=10.2.2.1 routing-mark=ISP1
add distance=3 gateway=10.3.3.1 routing-mark=ISP1
add distance=1 gateway=10.2.2.1 routing-mark=ISP2
add distance=2 gateway=10.1.1.1 routing-mark=ISP2
add distance=3 gateway=10.3.3.1 routing-mark=ISP2
add distance=1 gateway=10.3.3.1 routing-mark=ISP3
add distance=2 gateway=10.2.2.1 routing-mark=ISP3
add distance=3 gateway=10.1.1.1 routing-mark=ISP3
add distance=20 dst-address=185.228.168.9/32 type=blackhole
add distance=20 dst-address=208.67.220.220/32 type=blackhole
add distance=20 dst-address=208.67.222.222/32 type=blackhole
add distance=20 dst-address=94.140.14.14/32 type=blackhole
add distance=20 dst-address=94.140.15.15/32 type=blackhole
add distance=20 dst-address=8.20.247.20/32 type=blackhole
add distance=20 dst-address=9.9.9.9/32 type=blackhole
add distance=20 dst-address=9.9.9.10/32 type=blackhole
add distance=20 dst-address=8.26.56.26/32 type=blackhole

Any one here has workaround for route-cache=no? It's causing high CPU usage which cuts throughput significantly. Setting route-cache=yes breaks routing.

Edit: as it turns out it's allow-fast-path that conflicts.

txfz · Fri Jun 18, 2021 3:03 pm

I can't get this to work. I have set up a lab environment with 10.[1/2].0.1/30 as two different ISPs, and using Google DNS to determine Internet connectivity. I used this guide, which seems to be the same as OP plus the mangling. Alas, I don't have any Internet connectivity at all. Trying to ping something on the Internet from the router returns "no route to host".

/interface list
add name=wan
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip pool
add name=pool1 ranges=172.30.0.10
/ip dhcp-server
add address-pool=pool1 disabled=no interface=ether3 name=server1
/ip neighbor discovery-settings
set discover-interface-list=!dynamic
/interface list member
add interface=ether1 list=wan
add interface=ether2 list=wan
/ip address
add address=10.1.0.2/30 interface=ether1 network=10.1.0.0
add address=10.2.0.2/30 interface=ether2 network=10.2.0.0
add address=172.30.0.1/24 interface=ether3 network=172.30.0.0
/ip dhcp-server network
add address=172.30.0.0/24 gateway=172.30.0.1
/ip dns
set allow-remote-requests=yes servers=9.9.9.9
/ip firewall filter
add action=log chain=output routing-mark=to_ISP1
add action=log chain=output routing-mark=to_ISP2
/ip firewall mangle
add action=mark-connection chain=output connection-mark=no-mark connection-state=new new-connection-mark=ISP1_conn out-interface=ether1 passthrough=yes
add action=mark-routing chain=output connection-mark=ISP1_conn new-routing-mark=to_ISP1 out-interface=ether1 passthrough=yes
add action=mark-connection chain=output connection-mark=no-mark connection-state=new new-connection-mark=ISP2_conn out-interface=ether2 passthrough=yes
add action=mark-routing chain=output connection-mark=ISP2_conn new-routing-mark=to_ISP2 out-interface=ether2 passthrough=yes
/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=10.1.0.2
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=10.2.0.2
/ip route
add check-gateway=ping distance=1 gateway=8.8.8.8 routing-mark=to_ISP1
add check-gateway=ping distance=2 gateway=8.8.4.4 routing-mark=to_ISP1
add check-gateway=ping distance=1 gateway=8.8.4.4 routing-mark=to_ISP2
add check-gateway=ping distance=2 gateway=8.8.8.8 routing-mark=to_ISP2
add distance=1 dst-address=8.8.4.4/32 gateway=10.2.0.1 scope=10
add distance=1 dst-address=8.8.8.8/32 gateway=10.1.0.1 scope=10
/system clock
set time-zone-name=Europe/Stockholm

Chupaka · Tue Jun 22, 2021 6:48 pm

Trying to ping something on the Internet from the router returns "no route to host".

/ip route print details

txfz · Wed Jun 23, 2021 11:48 am

# jun/23/2021 10:47:16 by RouterOS 6.46.8
# software id = FWIF-LI4F
#
Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 0 A S  dst-address=0.0.0.0/0 gateway=8.8.8.8 
        gateway-status=8.8.8.8 recursive via 10.1.0.1 ether1 
        check-gateway=ping distance=1 scope=30 target-scope=10 
        routing-mark=to_ISP1 

 1   S  dst-address=0.0.0.0/0 gateway=8.8.4.4 
        gateway-status=8.8.4.4 recursive via 10.2.0.1 ether2 
        check-gateway=ping distance=2 scope=30 target-scope=10 
        routing-mark=to_ISP1 

 2 A S  dst-address=0.0.0.0/0 gateway=8.8.4.4 
        gateway-status=8.8.4.4 recursive via 10.2.0.1 ether2 
        check-gateway=ping distance=1 scope=30 target-scope=10 
        routing-mark=to_ISP2 

 3   S  dst-address=0.0.0.0/0 gateway=8.8.8.8 
        gateway-status=8.8.8.8 recursive via 10.1.0.1 ether1 
        check-gateway=ping distance=2 scope=30 target-scope=10 
        routing-mark=to_ISP2 

 4 A S  dst-address=8.8.4.4/32 gateway=10.2.0.1 
        gateway-status=10.2.0.1 reachable via  ether2 distance=1 scope=10 
        target-scope=10 

 5 A S  dst-address=8.8.8.8/32 gateway=10.1.0.1 
        gateway-status=10.1.0.1 reachable via  ether1 distance=1 scope=10 
        target-scope=10 

 6 ADC  dst-address=10.1.0.0/30 pref-src=10.1.0.2 gateway=ether1 
        gateway-status=ether1 reachable distance=0 scope=10 

 7 ADC  dst-address=10.2.0.0/30 pref-src=10.2.0.2 gateway=ether2 
        gateway-status=ether2 reachable distance=0 scope=10 

 8 ADC  dst-address=172.30.0.0/24 pref-src=172.30.0.1 gateway=bridge 
        gateway-status=bridge reachable distance=0 scope=10

Just for the record, I have tried adding a default route statically pointing to 10.1.0.1, and everything works then.

Chupaka · Wed Jun 23, 2021 12:39 pm

Ah, sorry, missed this part.

Yes, you always need to have a default route in your 'main' table, because policy routing for router-originated traffic works like this:

* router tries to connect to google.com
* it looks up routing table 'main' looking for a route to google.com; if it can't find any - the connection fails
* Firewall Mangle Output adjusts routing-mark for those outgoing packets
* routing process looks up corresponding routing table to find "final" route (this step is called "Routing Adjustment" in Packet Flow diagram)

Probably, that also can be solved by adding two routing rules to lookup packets from 'main' in your to_ISP1 and to_ISP2 tables, but that's less obvious. And needs checking :)

txfz · Fri Jul 02, 2021 1:17 pm

Hi,

I said "everything" works, but what I meant was "something" works. The actual failover still doesn't.

How are the "main" default routes supposed to look? I tried adding one for each WAN interface, with distances 3 and 4, respectively, but it doesn't help with the failover, and the router can only escape via primary ISP.

Can you verify that the route marking is done correctly? I've seen people do that in a few different ways, so I'm not sure about that.

I must be doing something fundamentally wrong, but I have no idea what.

Thanks for your help!

Chupaka · Fri Jul 02, 2021 6:22 pm

We are talking about router's traffic failover, right? Not clients' traffic?

How do you check that? What does "/ip route print details" show when you expect something to go another direction compared to what you observe?

txfz · Mon Jul 05, 2021 4:25 pm

Sorry, I should clarify. Client traffic not working is my primary concern. I just tried pinging from the router for troubleshooting, but it seems like that is mostly working, actually... it just doesn't seem to be able to switch in the middle of a ping command?

I've been testing by disconnecting the uplink of ISP 1. Here are the routing details for those conditions:

 0   S  dst-address=0.0.0.0/0 gateway=8.8.8.8
        gateway-status=8.8.8.8 recursive via 10.1.0.1 ether1 check-gateway=ping
        distance=1 scope=30 target-scope=10 routing-mark=to_ISP1

 1 A S  dst-address=0.0.0.0/0 gateway=8.8.4.4
        gateway-status=8.8.4.4 recursive via 10.2.0.1 ether2 check-gateway=ping
        distance=2 scope=30 target-scope=10 routing-mark=to_ISP1

 2 A S  dst-address=0.0.0.0/0 gateway=8.8.4.4
        gateway-status=8.8.4.4 recursive via 10.2.0.1 ether2 check-gateway=ping
        distance=1 scope=30 target-scope=10 routing-mark=to_ISP2

 3   S  dst-address=0.0.0.0/0 gateway=8.8.8.8
        gateway-status=8.8.8.8 recursive via 10.1.0.1 ether1 check-gateway=ping
        distance=2 scope=30 target-scope=10 routing-mark=to_ISP2

 4 A S  ;;; hack 1
        dst-address=0.0.0.0/0 gateway=10.1.0.1
        gateway-status=10.1.0.1 reachable via  ether1 distance=3 scope=30
        target-scope=10

 5   S  ;;; hack 2
        dst-address=0.0.0.0/0 gateway=10.2.0.1
        gateway-status=10.2.0.1 reachable via  ether2 distance=4 scope=30
        target-scope=10

 6 A S  dst-address=8.8.4.4/32 gateway=10.2.0.1
        gateway-status=10.2.0.1 reachable via  ether2 distance=1 scope=10
        target-scope=10

 7 A S  dst-address=8.8.8.8/32 gateway=10.1.0.1
        gateway-status=10.1.0.1 reachable via  ether1 distance=1 scope=10
        target-scope=10

 8 ADC  dst-address=10.1.0.0/30 pref-src=10.1.0.2 gateway=ether1
        gateway-status=ether1 reachable distance=0 scope=10

 9 ADC  dst-address=10.2.0.0/30 pref-src=10.2.0.2 gateway=ether2
        gateway-status=ether2 reachable distance=0 scope=10

10 ADC  dst-address=172.30.0.0/24 pref-src=172.30.0.1 gateway=bridge
        gateway-status=bridge reachable distance=0 scope=10

The router seems to find its way out, but not the clients. I'm pinging 9.9.9.9. Doing a trace from the client reveals that it still seems to try to go via ISP 1.

Chupaka · Mon Jul 05, 2021 6:42 pm

Your Firewall Mangle rules only mark router's traffic (chain=output). For clients, you need to mark in chain=prerouting. You can see an example in the manual: https://wiki.mikrotik.com/wiki/Manual:P ... cy_routing

PackElend · Thu Jul 29, 2021 6:04 pm

thx for doing:

aight, of course, half an hour after this post, I think I got it. :)
Here's my working theory:
...

but I still have a little understanding challenge. It is this here:

, recursive routes are not recalculated (or something) and all traffic still goes via another uplink

or when traffic is swapped back to GW1.

In detail, my remaining questions are:

When GW1 comes back up again, new connections will go through GW1 as it has the lower distance but what happens to the established connections? Are these moved from GW2 to GW2 automatically or only if GW2 goes down?
When are routes recalculated? Each time an interface comes up?
How often is ping check-gateway=ping carried out?

thx
Stefan

-----------------
@Chupaka, thx for sharing all this with us :) :)

rextended · Thu Jul 29, 2021 7:05 pm

1) All connection on connection-track and the others are broken, I made some script for clear all "EX" connections, useful for SIP and the others.
2) Yes and not, is not the only reason, like "ping" on external IP
3) 10 seconds

PackElend · Thu Jul 29, 2021 7:17 pm

1) All connection on connection-track and the others are broken, I made some script for clear all "EX" connections, useful for SIP and the others.

I'm really sorry but I don't get it.
Does it mean, that if it was once conn-tracked of GW1 and has not been cleared yet, it will be swapped back from GW2 to GW1 as it has the lower distance?

2) Yes and not, is not the only reason, like "ping" on external IP

so anytime the routing table is used?

3) 10 seconds

this is clear?
It this fixed or can it be modified?

rextended · Thu Jul 29, 2021 7:39 pm

1) This is more clear: ALL IS BROKEN, and all (related to the inactive gateway) the connections memorized on connection-track are all invalid, but the system do not clear it until single timeout for each connection is reached.

2) NO, simply can't have a complete list on "when", but are not recaulculated for each use.

3) what mean "this is clear?"

3) I never find a way to change that, is hardcoded somewhere to 10 seconds.

PackElend · Thu Jul 29, 2021 7:54 pm

1a) ok, all cleared at timeout.
1b) Any automatic switch back from GW2 to GW1 for established connections?
2) so some cases are known, but not all cases, when "recalculation". You can expect it happens on related interface when pinging an external IP or link goes down/up.
3) sorry, forgot to remove "?"

rextended · Thu Jul 29, 2021 8:01 pm

1b) If I'm not sure if coming back faulty gateway the disrupted connections works again...

PackElend · Wed Aug 18, 2021 10:40 pm

1b) If I'm not sure if coming back faulty gateway the disrupted connections works again...

I would say this is answered in #26 of NAT: Masquerade can leak private IP, why&how? - MikroTik.

My mind is still blocked to formulate a reasonable answer so that the support answers in detail, as what happens in the background is still unclear.

rextended · Wed Aug 18, 2021 10:44 pm

For peace of mind,
destroy all connection joints with the gateway that is no longer available
and you will never have regrets.

PackElend · Wed Aug 18, 2021 11:04 pm

that would be done how?

rextended · Thu Aug 19, 2021 1:23 am

that would be done how?

Some examples
here
viewtopic.php?f=9&t=154606&p=853803#p853803
here
viewtopic.php?f=13&t=176956&p=868082#p870786
and here
viewtopic.php?f=13&t=176956&p=868082#p870952

PackElend · Thu Aug 19, 2021 10:46 am

Some examples

great, thx

anav · Sun Oct 31, 2021 2:14 am

FROM MRZ: The target scope must be larger than the scope of the route over which you want to resolve the gateway (by at least 1).
You could get resolve loop if you put the target in the same scope where you are resolving.

Note Post from VERSION 7 beta, which will impact this thread eventually

rdtech · Fri Nov 12, 2021 3:50 am

Hi, Chupaka great write up. will this work with the same default gateway that uses the interface variable "%" ?

Chupaka · Tue Nov 16, 2021 5:44 pm

Unfortunately, no: routes with interface specified do not participate in recursive route lookup, at least in RouterOS v6

mducharme · Wed Nov 17, 2021 12:37 pm

Unfortunately, no: routes with interface specified do not participate in recursive route lookup, at least in RouterOS v6

Hi Chupaka,

As @anav points out above, you might want to update your tutorial to increase target-scope to be one more than scope, as that is necessary on RouterOS v7. This is not due to a bug, but due instead to an intentional change in the behavior of the feature.

Chupaka · Fri Nov 19, 2021 11:03 am

Thanks, I've commented in that topic (viewtopic.php?p=891975#p891975) and updated the tutorial to use scopes 11 and 12 for resolving routes. Now that looks even more complex

ghostzero · Fri Dec 10, 2021 7:15 pm

Hi,

first: Thanks for all the information. It is a lot of helpful information so far.

However, it is actually a bit much and therefore I am a bit confused to find the really relevant parts for me and what exactly what part does.

I have a failover WAN already working using recursive routing. However, with my current setup, the gateway IPs I use to check that the WAN (WAN1) is reachable (8.8.8.8 and 8.8.4.4) will not be reachable, once it falls back to the failover WAN (WAN2) which is a bit of an issue.

Do I understand this correctly and that is why the example in the first post use routing-marks for WAN1 and WAN2 gateway checks? in adition to the normal routing table?

If so, I guess I have to mark the traffic accordingly using mangle rules? Or is it somehow automatically marked by the corresponding final gateway - which I doubt? If I use mangle, I guess I need prerouting and output (clients and router)? Also why are the hosts inverted for ISP2, shouldn't the checks be the same order?
I guess I only need output and even then only if it goes to my hosts "8.8.8.8"/"8.8.4.4" using the corresponding interface to mark it accordingly as I really only need the marks for the gateway checks?

Can someone provide a full working sample, including necessary mangle rules to get this working and explain what each section does exactly, so I now how I best modify it for my scenario?

I guess optionally I could check against hosts, that I do not need, e.g. cloudflare DNS servers, if I use Google DNS servers myself but not really that nice I think.

Also I only want fallover WAN, no load balancing as the secondary WAN is way slower and will not improve performance but hinder it if WAN1 is up. It is just there in case WAN1 is down, so people can still access the necessary applications that require an internet connection.

And is it just me or is it relatively complicated to setup this feature? I know it is a ROS limitation but wouldn't it just be easier to add hosts to check for a route and if non are reachable, the route is down? then all if this could be condensed into two routing entries.

Thanks for any feedback in advance.

bpwl · Fri Dec 10, 2021 7:44 pm

For just failover , I don't see the need to test the 2nd (backup) route with recursive routes, if returning to the first is OK, whenever it becomes available again.

Testing the first as usual with recursive routes. If the first fails the test, it will be down, then the second is used (based on larger distance value than first).
No need to test. If it is not working it was the last fallback anyway. If the first is operational it will be chosen by its lower distance.

ghostzero · Fri Dec 10, 2021 8:38 pm

@bpwl: Thanks for the feedback. As mentioned it mostly works fine this way. However, if the main WAN is down, then the gateway used to check for it, is not reachable either as the route with the gateway (e.g. "8.8.8.8") is down but not the one where "8.8.8.8" is the dst-address which is nearly always the case for me as the ISP modem provides an internal gateway IP, if the ISP itself does not provide a correct IP, but the Internet itself does not work, of course.

My solution for now is to use DNS servers I don't use myself, so I use CloudFlare DNS and OpenDNS servers, that way the Google DNS is still reachable, once the failover is in use but I am not sure this is the best solution? Though it is probably the easiest solution.

I might change the DNS servers I use to CloudFlare DNS but then I will just check against Google DNS and OpenDNS instead.

Or is there a workaround for this issue that can be easily implemented?

bpwl · Fri Dec 10, 2021 8:57 pm

Yes, that's how it works.

The IP used for checking is bound to the interface for the check, and as such is NOT reachable if that interface is down. It should not be used as an available resource!

Here a little story : viewtopic.php?t=45482#p895822

There are enough IP addresses that can be used for checking (only) and still have that function from elsewhere. https://www.lifewire.com/free-and-publi ... rs-2626062
Some also set multiple check-IP's for the same recursive route (see #130)

ghostzero · Fri Dec 10, 2021 9:42 pm

@bpwl Thanks again. I expected this to be the case (since yesterday).

I guess there is no easy way around this? It isn't that problematic because as you mentioned there are multiple safe IP addresses to check against but I still think it is a weird limitation

It is also something that is very seldom explained, which was why yesterday when I needed the failover Internet the first time it wasn't working as expected. Sure I tested it as far as possible before but that was basically disabling the interface altogether or removing the network cable, which of course also resulted in the interface itself being down, so the failover worked but not because of the gateway check and also testing with an invalid gateway IP worked because the DNS server was not blocked then

What might have worked to test it, is to disconnect the cable after the modem but to be honest I didn't think about this and it might not have worked either, as it kind of depends on how to modem reacts to that scenario.

In hindsight it also makes total sense, as the route for e.g. "8.8.8.8" would still be active, as it is outside of the gateway check

But now that I know about this, it is easy to avoid this issue.

anav · Sat Dec 11, 2021 4:20 am

Most use two different normally available sites, some use three even LOL, google, cloudfare, opendns, quad9 etc.....

ghostzero · Sat Dec 11, 2021 8:35 am

Thanks for the information. As mentioned it is fine in my case now, just find it an odd limit, but I guess if want really needs it, there is always scripting to do this.

I now use two DNS servers to check (in case one is down, however unlikely that is) and one of the remaining providers for my real DNS queries, so it is fine now.

Sob · Sat Dec 11, 2021 12:11 pm

Most use two different normally available sites, some use three even LOL, google, cloudfare, opendns, quad9 etc.....

I don't know if I'm the only one, but to me it always seemed kind of rude to use them for this. They provide free DNS, but they (AFAIK) don't invite the whole world to constantly ping their servers. I'm sure they can handle it, but still... Also, since they didn't make any promises regarding pings, what if they decide that they had enough and block it? It's probably not very likely, but that would be fun.

That said, I don't have better alternative, i.e. some always on servers that welcome people to use them for this.

bpwl · Sat Dec 11, 2021 12:24 pm

It's probably not very likely, but that would be fun

We do take a lot for granted ... .

By failover (and load balancing) my route list contains the same paths again but without the "recursive based ping check" at a larger distance.
Just in case the "well known and trusted" servers stop responding. Test will fail, but traffic will still flow if the path is usable. There is some time now to find another "to Shanghai".

ghostzero · Sat Dec 11, 2021 2:58 pm

It's probably not very likely, but that would be fun.

This is why I didn't add any checks for my failover Internet

So if those all stop working, then the failover Internet will still work until I can adjust the configuration accordingly (e.g. remove the checks from the main internet).

One solution would be to e.g. use a script that actually does a DNS lookup and checks if the server provides a response, in which case the Internet is working. More complicated to implement, but it would still work and probably produce more traffic/load for their DNS servers than just ping

Sob · Sat Dec 11, 2021 3:15 pm

Scripts can do many things, they are even more powerful than this. For example, if you'd want to switch back from backup to main connection only after it's been stable for some given time, with script it would be possible. The beauty of this solution is that it's all built-in function of router. So you don't have to write any script (I find writing RouterOS script very much not admin-friendly), and it's less likely to break.

Chupaka · Sat Dec 11, 2021 8:08 pm

So, ghostzero, yes, the solution is to put your clients' traffic to a separate routing table. You create additional default routes in that table, then mark all traffic-to-be-routed-outside (like "in-interface=LAN-Bridge dst-address-type=!local" to exclude traffic destined to the router itself, like DNS requests) with that mark. Voila, your clients don't use a route with dst=8.8.8.8, only your default routes (primary and failover in case of troubles).

ghostzero · Sun Dec 12, 2021 12:46 am

@Chupaka thanks for the feedback. Makes sense, though I think it is indeed unnecessary for me and I will just check against hosts I do not use anyway.

Though I have to admit I had hoped MikroTik would provide an easier solution to thi, like e.g. just adding a list of IP addresses to check for a route to be active.

Though it is a nice solution as it avoids scripting, which can be tricky though powerful and might break upon migrating to a different hardware or upgrading RouterOS as Sob mentioned, so using recursive routing is preferred.

IntLDaniel · Wed Jan 05, 2022 12:17 pm

Unfortunately, no: routes with interface specified do not participate in recursive route lookup, at least in RouterOS v6

Hi, is this still valid for routerOS v7.x ?

Chupaka · Wed Jan 05, 2022 4:40 pm

OMG, looks like this limitation is removed in v7! Where's my Champaigne?!?

IntLDaniel · Wed Jan 05, 2022 4:44 pm

Wau..and could you please show me the screenshot from WinBox where to specify interface in recursive route lookup? The route poperties window is different than in ROS v6. Thanks

Chupaka · Wed Jan 05, 2022 5:58 pm

where to specify interface in recursive route lookup?

Sorry?..

You set the interface right in the "Gateway" parameter

andyhenckel · Wed Jan 05, 2022 7:17 pm

After spending the last hour trying to figure out why my dual wan failover version6 didn't work in V7, Reading multiple posts and watching a utubevid I tried this and it seems to work with behavior as expected. This is on V7.2Rc1 with Cube60AC. Has wlan1 and wlan60. The 60 interface is DHCP and so is the WLAN1. Single ethernet interface 88.1/24. I can disable either interface, one at a time, and it immediately shifts traffic. On the Barn end of this setup, I've got one AP with 5G and a different AP with 60G. I rebooted the 60G unit, and it took 3 seconds for traffic to transfer to the 5G. once back on, I dropped one ping during the switch back to the 60 g system. 10.2.4.5 is the GW on the 60G interface (primary desired link) 10.3.44.1 is the GW on the backup link.

Here is the new and working basic config:

/ip firewall nat
add action=masquerade chain=srcnat out-interface=wlan1
add action=masquerade chain=srcnat out-interface=wlan60-1
/ip route
add check-gateway=ping disabled=no distance=1 dst-address=0.0.0.0/0 gateway=10.2.4.5 pref-src="" routing-table=main scope=30 suppress-hw-offload=no target-scope=10 vrf-interface=wlan60-1
add check-gateway=ping disabled=no distance=2 dst-address=0.0.0.0/0 gateway=10.3.44.1 pref-src="" routing-table=main scope=30 suppress-hw-offload=no target-scope=10 vrf-interface=wlan1
/ip address
add address=192.168.1.253/24 interface=ether1 network=192.168.1.0
/ip dhcp-client
add add-default-route=no interface=wlan1
add add-default-route=no interface=wlan60-1

Works for me on this version....Hope it works for you all.

Just fyi - here is the old version that worked in 6.x - less the firewall rules, and dhcp-client commands that are identical as the new commands.
In this V6 working version example - the primary desired path gw is 10.3.127.1

/ip route
add check-gateway=ping distance=1 gateway=1.1.1.1
add check-gateway=ping distance=2 gateway=8.8.8.8
add check-gateway=ping distance=1 dst-address=1.1.1.1/32 gateway=10.3.127.1 scope=10
add check-gateway=ping distance=1 dst-address=8.8.8.8/32 gateway=10.3.121.1 scope=10

Chupaka · Wed Jan 05, 2022 7:28 pm

Your config has nothing to do with neither recursive routes nor checking if Internet is available behind your gateways. It's useless in this topic.

PackElend · Wed Jan 05, 2022 7:45 pm

It's useless in this topic.

Come on, at least it is not wasted effort. I bit more of kindness would be grateful

IntLDaniel · Wed Jan 05, 2022 7:49 pm

where to specify interface in recursive route lookup?
Sorry?..

You set the interface right in the "Gateway" parameter

There is no calassic dropdown option to select the interface like in v6, so should I type the name of interace manually there?

Chupaka · Wed Jan 05, 2022 8:54 pm

Correct