Community discussions

MikroTik App
 
User avatar
Filo
newbie
Topic Author
Posts: 42
Joined: Thu Jan 13, 2022 2:37 pm
Location: Germany

Simpler Failover for two Gateways I found working

Sun Aug 27, 2023 1:54 pm

Hey,

like many others I was wondering how to accomplish a simple failover with two Gateways (here: DSL and LTE) with MikroTik involved.
Searching the Internet and this Board, all I was able to find was "Recursive Routes" with checking e.g. 8.8.8.8 as a "Gateway".
This was not working at first and I wasn't happy with recursion in the routes so I managed to get the task done with another way I was not able to find anywhere while searching, so I'm sharing this:

Done this on RB5009 yesterday - in Winbox:

1. Prerequirements:
- Network with DHCP done by MicroTik (in this case: 192.188.1.0/24)
- Standard Gateway in DHCP will be the MikroTik (here: 192.168.1.2)
- Internet available at (for Example) 192.168.1.1 (in this case DSL)
- Internet available at (for Example) 192.168.1.250 (LTE-Modem)

2. Routing:
- Standard Route 0.0.0.0/0 set to 192.168.1.250 with Distance 1 comment=LTE-Failover -> (keep it DEACTIVATED)
- Standard Route 0.0.0.0/0 set to 192.168.1.1 with Distance 2

3. Go to ROUTING -> TABLES
- Create a Routing Table named (for Example) "DSL" - check FIB

4. Go To IP -> ROUTES -> Click +
- Dst,Address: 0.0.0.0/0
- Gateway: 192.168.1.1 (your Primary Gateway)
- Routing Table: Select above created ROUTING TABLE (here: "DSL")

5. Go to IP -> FIREWALL -> Tab MANGLE
Create a MANGLE-Rule:
- Tab -> GENERAL
-- Chain: output
-- Dst.Address: 8.8.8.8
-- Protocol: 1 (icmp)
- Tab -> ACTION
-- Action: mark routing
-- New Routing Mark: Select above created ROUTING TABLE (here: "DSL")

6. Go to TOOLS -> NETWATCH
-Tab -> HOST
-- Create a Netwatch Host:
--- Host: 8.8.8.8
--- Type: icmp
--- Interval: 00:00:30
--- Timeout: 5.00

-Tab -> Down
/ip route enable [find comment=LTE-Failover]

-Tab -> Up
/ip route disable [find comment=LTE-Failover]

What's this doing?

We were creating TWO STANDARD ROUTES for Traffic leaving the local network to the internet.
The secondary route (in this case LTE) has a higher priority (say: "lower distance") but is kept disabled.
By creating a second Routing Table and a firewall mangle-rule we will force the ICMP-Request to 8.8.8.8 through the primary gateway (in this case: DSL).
Netwatch is able to perform scripts if the host becomes unavailable through the primary route.
The DOWN-script will enable the secondary route which will become active immediately due to the higher priority (say: "lower distance")
All traffic to Internet will go through the secondary route now.
Netwatch will still check every 30 seconds pinging 8.8.8.8 forced to the primary gateway as of our mangle-rule.
If 8.8.8.8 will be available again through the primary gateway the UP-script will deactivate the secondary route again.
All traffic will go through the primary route again.

Please note that you will not be able to use the host used ( in this case 8.8.8.8 ) as an upstream DNS-Server, since it won't work when LTE kicks in.

I'm not an MikroTik-Expert by far, still learning, but I found this way a bit more straight-forward and understandable than the "recursive routes" many tutorials show up with. Also you can extend the scripts by sending EMails out (configure TOOLS -> EMAIL first) by adding for example:

:delay 10
/tool e-mail send to=youremail@host.com subject="DSL is DOWN!!" body="DSL inactive - LTE active"


at the end of the script.

Still, I was wondering, if this is already documented somewhere, that's why I posted it here. Please disregard or close if this is "too obviuous" or "already well documented" :)

Have a great day, everyone, many greetings,
Martin!

*EDIT*: I was choosing this variant for failover over the "recursive Routes" because I'd like to maintain more control about failover.
The script can be extended, and getting an EMail, WHEN failover happens is quite nice. Also we could add even MORE Netwatch-hosts. For example: The FIRST netwatch checks 8.8.8.8 and if this fails a script may ENABLE the SECOND Netwatch-Host to check, just to verify, and only after BOTH would fail, the secondary route may kick in. I think this has more opportunities at all :)
Last edited by Filo on Sun Aug 27, 2023 9:59 pm, edited 3 times in total.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Sun Aug 27, 2023 3:19 pm

Thanks FILO, nice explanation.
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 4089
Joined: Sun May 01, 2016 7:12 pm
Location: California
Contact:

Re: Simpler Failover for two Gateways I found working

Sun Aug 27, 2023 3:45 pm

You should not use the numbers from "/ip/route/print" to disable an interface (step 6 in OP). The numbers are transitory, so you need to either use the .id for route or use [find something=that] as what to disable/enable. Why most other example use [find comment="WAN1"] or something like that to find the route to enable/disable.

Also, without firewall marking, incoming connection are not possible using this approach. So VPN's be tricky with this approach.
 
User avatar
Filo
newbie
Topic Author
Posts: 42
Joined: Thu Jan 13, 2022 2:37 pm
Location: Germany

Re: Simpler Failover for two Gateways I found working

Sun Aug 27, 2023 4:02 pm

You should not use the numbers from "/ip/route/print" to disable an interface (step 6 in OP)…

Also, without firewall marking, incoming connection are not possible using this approach. So VPN's be tricky with this approach.
THIS is important - thanks for reminding me, will edit the first post accordingly today. When the board gets rebooted the IDs will / might change.

VPN is okay in this case - I‘m using a dynamic DNS able to update quite quickly through the Routerboard itself. Also two different Dynamic-DNS-Hosts are in place for each connection as a backup, so it‘s possible to VPN into any of those.

Thanks for the reply and correction!!

*Edit*: Script altered with "find" command and comment on LTE-Failover-Route
 
derolf
just joined
Posts: 7
Joined: Sat Apr 13, 2024 6:29 pm

Re: Simpler Failover for two Gateways I found working

Sat Apr 13, 2024 8:56 pm

Will this also work with only a single subnet?
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Sun Apr 14, 2024 4:30 pm

Yes.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Sun Apr 14, 2024 4:46 pm

The advantage of netwatch, primarily, is that you can vary some variables here to ascertain connectivity with more fidelity!!
For example, gateway-ping checks every 10 seconds, after two repetitive nil responses, the connection is deemed not active.
For many that is too long and thus netwatch if set at 10 seconds, is half that response time etc.... Why the OP went 30 seconds is not understood, .?????

sample (some of many) other parameters one can use for fidelity --> ICMP PROBE OPTIONS: thr-avg, thr-jitter, thr-max, thr-stdev
https://help.mikrotik.com/docs/display/ROS/Netwatch

Finally, one has to be careful about ICMP probes from netwatch as they will leak and try to go out any available route..........
This should be done in IP routes. Assume you have two wans, and doing netwatch on both.......... 1.1.1.1 is netwatch host for WAN1 and 1.0.0.1 is host for WAN2

/ip route
add comment=WAN1 distance=1 dst-address=0.0.0.0/0 gateway=XX.XX.XX.1 routing-table=main
add comment=WAN1-dns distance=1 dst-address=1.1.1.1/32 gateway=XX.XX.XX.1 routing-table=main
add comment="Stop Leak" distance=2 dst-address=1.1.1.1 black-hole=yes routing-table=main
++++++++++++++++++++
add comment=WAN2 distance=2 dst-address=0.0.0.0/0 gateway=XX.XX.XX.2 routing-table=main
add comment=WAN2-dns distance=1 dst-address=1.0.0.1/32 gateway=XX.XX.XX.2 routing-table=main
add comment="Stop Leak" distance=2 dst-address=1.0.0.1 black-hole=yes routing-table=main
Last edited by anav on Tue Jul 16, 2024 2:39 am, edited 1 time in total.
 
derolf
just joined
Posts: 7
Joined: Sat Apr 13, 2024 6:29 pm

Re: Simpler Failover for two Gateways I found working

Tue Apr 16, 2024 7:14 pm

I want to do the same (5G + DSL-Failover).

- Do you have your box in bridge or router mode?
- What cable do you connect on what port on your box?
 
cyayon
Frequent Visitor
Frequent Visitor
Posts: 76
Joined: Wed Aug 24, 2022 9:39 am

Re: Simpler Failover for two Gateways I found working

Wed Jul 17, 2024 6:15 pm

Hi,

I think using mangle mark packet is no more necessary since recent version of ROS.
Netwatch is able to define a src-address, then simply use it to ping from the DSL interface.
Moreover, I think that using mangle/mark can cause issue with fasttrack firewall rules.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Fri Jul 19, 2024 5:24 pm

Depends upon requirements. Netwatch potentially leaks in terms of using any route it can find to check ping........... Thus if WAN1 checks 1.1.1.1 DNS, netwatch will try to use WAN2 to check 1.1.1.1 and when it does, will not report WAN1 as not available or at least that was what my understanding was. Probably wrong though,
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 4089
Joined: Sun May 01, 2016 7:12 pm
Location: California
Contact:

Re: Simpler Failover for two Gateways I found working

Fri Jul 19, 2024 7:06 pm

I think @cyayon is right that that using the newer "src-address" in netwatch SHOULD work. But you'd need to know the src-address to set, which means having a static IP... so that kinda limits the approach while mangle it just setting routing table, which could have a interface route without IP.

Perhaps there is some effect with fasttrack+mangle..., but it's already going to go via the CPU path since traffic is routed to internet.

Anyway, I think @Filo approach using mangle seems "safer" since it's pretty explicit in what's happening. But do think "src-address" might allow skipping the mangle step - you'd just want to make sure to test it pretty well since the src-address in netwatch is relatively new.
 
cyayon
Frequent Visitor
Frequent Visitor
Posts: 76
Joined: Wed Aug 24, 2022 9:39 am

Re: Simpler Failover for two Gateways I found working

Tue Jul 23, 2024 11:32 pm

It would be great to be able to ping from interface (like on Linux ping -I …).
We can also use a dhcp client script which update Netwatch sec-address (if wan address is not fixed).
Another enhancement would be to be able to ping multiple IP before declaring wan interface down, like nested recursive routing …
It should be possible to do nested Netwatch too… but it will be complicated.
Personally I am using a script which run each minute and ping multiple IP from interface and if all ping failed, then disable primary default route (but keep another primary with a longer distance).
 
cyayon
Frequent Visitor
Frequent Visitor
Posts: 76
Joined: Wed Aug 24, 2022 9:39 am

Re: Simpler Failover for two Gateways I found working

Wed Jul 24, 2024 11:00 am

Here the script which is scheduled every minutes.

It's far from perfect but it worked. I do not use my CCR2116 for dual-wan/failover, I moved my wan2 on another router (pure linux).
Do not hesitate to purpose enhancements and corrections.
# check wan
#
# use this with netwatch or scheduler
# prefer netwatch with src-address
#
# TODO : 
# use netwatch src-address
# disable dhclient update recursive route 

# version 20230803

:global checkWanStatus
:global checkWanRun

#
# define vars
#
# wan1
:local iface "wan1"
:local tableRoute "route-wan1"
:local gateway xx.xx.xx.xx.xx"
:local srcAddress "xx.xx.xx.xx"
:local distanceDefault 1
:local distancePersist 101
:local dstAddress "0.0.0.0/0"

# wan2
#:local iface "wan2"
#:local tableRoute "route-wan2"
#:local gateway "192.168.6.1"
#:local srcAddress "192.168.6.254"
#:local distanceDefault 9
#:local distancePersist 109
#:local dstAddress "0.0.0.0/0"

:local resetConn 1
:local addrTest1 "1.0.0.1"
:local addrTest2 "9.9.9.10"
:local addrTest3 "8.8.4.4"
:local countTest 3
:local LogHeader "check-wan"
:local email "xxx@xxx.com"
:local Date [/system clock get date];
:local Time [/system clock get time];
:local routeStatus
:local pingStatus
:local prouteStatus
:local trouteStatus

# init 
:if ( [:tostr $checkWanStatus]  = "" ) do={
      :set checkWanStatus ($Time . " " . $Date);
}
:set checkWanRun ($Time . " " . $Date);


:local msg
:local addr


#
# define dynamic vars
#
#:set dstAddress ""
#:set gateway ""
#:set srcAddress ""
if ( [:tostr $distanceDefault] = "" ) do={
           :set distanceDefault "1"
           :set msg "$LogHeader : defined distance=$distanceDefault"
           :put "$msg"
}

if ( [:tostr $dstAddress] = "" ) do={
    :set dstAddress "0.0.0.0/0"
    :set msg "$LogHeader : defined dst-address=$dstAddress"
    :put "$msg"
}
    
if ( [:tostr $gateway] = "" ) do={
    :set gateway ([/ip route print detail as-value where distance=$distanceDefault routing-table=main dst-address=$dstAddress ]->0->"gateway")
    :set msg "$LogHeader : defined gateway=$gateway"
    :put "$msg"
     if ( [:tostr $gateway] = "" ) do={
          :set msg "$LogHeader : null gateway=$gateway"
          #/tool e-mail send to=$email subject="$msg"
          :set checkWanStatus "ERROR"
          :error "$smg"
     }
}

if ( [:tostr $srcAddress] = "" ) do={
    :set srcAddress ([/ip/address print detail as-value where interface=$iface ]->0->"address") 
    :local delim [:find $srcAddress "/" 0]; :set srcAddress [ :pick $srcAddress 0 $delim ]
    :set msg "$LogHeader : defined src-address=$srcAddress"
    :put "$msg"
     if ( [:tostr $srcAddress] = "" ) do={
          :set msg "$LogHeader : null src-address=$srcAddress"
          #/tool e-mail send to=$email subject="$msg"
          :set checkWanStatus "ERROR"
          :error "$smg"
     }
}


#
# test table route
#
:set trouteStatus ([/ip route print detail as-value where gateway="$gateway" routing-table=$tableRoute dst-address=$dstAddress disabled=no ]->0->"distance")
if ( [:tostr $trouteStatus] = "" ) do={
     :set trouteStatus "FAILED"
     :set msg "$LogHeader : current table route dst-address=$dstAddress gateway=$gateway routing-table=$tableRoute FAILED !"
     :put "$msg"; 
     :log warning "$msg"
} else={
     :set msg "$LogHeader : current table route dst-address=$dstAddress gateway=$gateway routing-table=$tableRoute distance=$trouteStatus alive"
     :put "$msg"; 
     :set trouteStatus "OK"
}


#
# test current persistent route
#
:set prouteStatus ([/ip route print detail as-value where gateway="$gateway" distance=$distancePersist routing-table=main dst-address=$dstAddress disabled=no ]->0->"distance")
if ( [:tostr $prouteStatus]  != "$distancePersist" ) do={
     :set prouteStatus "FAILED"
     :set msg "$LogHeader : current persistent route dst-address=$dstAddress gateway=$gateway distance=$distancePersist routing-table=main FAILED !"
     :put "$msg"; 
     :log warning "$msg"
} else={
     :set msg "$LogHeader : current persistent route dst-address=$dstAddress gateway=$gateway distance=$distancePersist routing-table=main alive"
     :put "$msg"; 
     :set prouteStatus "OK"
}


#
# test current route
#
:set routeStatus ([/ip route print detail as-value where gateway="$gateway" distance=$distanceDefault routing-table=main dst-address=$dstAddress disabled=no ]->0->"distance")
if ( [:tostr $routeStatus]  != "$distanceDefault" ) do={
     :set routeStatus "FAILED"
     :set msg "$LogHeader : current route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main FAILED !"
     :put "$msg"; 
     :log warning "$msg"
} else={
     :set msg "$LogHeader : current route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main alive"
     :put "$msg"; 
     :set routeStatus "OK"
}


#
# test ping
#
:set pingStatus "OK"
:set addr "$addrTest1"
if ([/ping $addr src-address=$srcAddress count=$countTest]=0) do={
      :set msg "$LogHeader : ping src-address=$srcAddress $addr FAILED !"
      :put "$msg"; :log warning "$msg"
      :set pingStatus "WARNING"

      :set addr "$addrTest2" ;
      if ([/ping $addr src-address=$srcAddress count=$countTest]=0) do={
             :set msg "$LogHeader : ping src-address=$srcAddress $addr FAILED !"
             :put "$msg"; :log warning "$msg"
             :set pingStatus "WARNING"

             :set addr "$addrTest3" ;
             if ([/ping $addr src-address=$srcAddress count=$countTest]=0) do={
                    :set msg "$LogHeader : ping src-address=$srcAddress $addr FAILED !"
                    :put "$msg"; :log error "$msg"
                    :set pingStatus "FAILED"
             } else={
                    :set msg "$LogHeader : ping src-address=$srcAddress $addr alive"
                    :put "$msg"; 
                    #:log info "$msg"
             }
      } else={
             :set msg "$LogHeader : ping src-address=$srcAddress $addr alive"
             :put "$msg"; 
             #:log info "$msg"
      }
} else={
      :set msg "$LogHeader : ping src-address=$srcAddress $addr alive"
      :put "$msg"; 
      #:log info "$msg"
}


# status
:set msg "$LogHeader : routeStatus:$routeStatus trouteStatus:$trouteStatus prouteStatus:$prouteStatus pingStatus:$pingStatus"
:put "$msg"; 
#:log info "$msg"


#
# final decision
#
if ( $pingStatus = "FAILED") do={
         :set checkWanStatus "$iface FAILED"
         :set msg "$LogHeader : interface $iface FAILED !"
         :put "$msg"; :log warning "$msg"
         if ($routeStatus = "OK") do={
             /ip route set [find gateway=$gateway distance=$distancePersist routing-table=main dst-address=$dstAddress disabled=yes ] disabled=no comment="$iface persistdef - $LogHeader $Time enabled"; 
             /ip route set [find gateway=$gateway distance=$distanceDefault routing-table=main dst-address=$dstAddress disabled=no ] disabled=yes comment="$iface def - $LogHeader $Time disabled";
             if ( $resetConn = "1" ) do={
                     /ip/firewall/connection remove [find]
             }
             :set checkWanStatus "$iface DISABLED"
             :set msg "$LogHeader : route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main reset-conn:$resetConn DISABLED !"
             :put "$msg"; :log warning "$msg"
             :delay 3;
             /tool e-mail send to=$email subject="$msg" } else={
                  :set msg "$LogHeader : route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main already disabled"
                  :put "$msg"; :log info "$msg" }
} else={
         if ($routeStatus = "FAILED") do={
              :set checkWanStatus "$iface RESTORED"
              /ip route set [find gateway=$gateway distance=$distancePersist routing-table=main dst-address=$dstAddress disabled=yes ] disabled=no comment="$iface persistdef - $LogHeader $Time restored"; 
              /ip route set [find gateway=$gateway distance=$distanceDefault routing-table=main dst-address=$dstAddress disabled=yes ] disabled=no comment="$iface def - $LogHeader $Time restored";
              if ( $resetConn = "1" ) do={
                     /ip/firewall/connection remove [find]
              }
              :set msg "$LogHeader : route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main reset-conn:$resetConn RESTORED !"
              :put "$msg"; :log warning "$msg"
              :delay 3;
              /tool e-mail send to=$email subject="$msg"} else={
                   :set msg "$LogHeader : interface $iface alive"
                   :put "$msg"; 
                   :log info "$msg"
                   :set checkWanStatus "$iface OK"
             }
}


 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 12438
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Simpler Failover for two Gateways I found working

Wed Jul 24, 2024 11:20 am

It is always wrong to use the firewall to decide the routes (except in exceptional cases).

For routes must be used... routes...

v6 wrong example code

/ip firewall mangle
add action=mark-routing chain=output dst-address=3.3.3.3 routing-table=!mytable new-routing-mark=mytable

/ip route
add dst-address=3.3.3.3/32 gateway=3.3.3.3 routing-mark=mytable

v6 correct example code

/ip route
add dst-address=3.3.3.3/32 gateway=3.3.3.3 routing-mark=mytable

/ip route rule
add dst-address=3.3.3.3/32 table=mytable

v7 wrong example code

/routing table
add fib name=mytable

/ip firewall mangle
add action=mark-routing chain=output dst-address=3.3.3.3 routing-mark=!mytable new-routing-mark=mytable 

/ip route
add dst-address=3.3.3.3/32 gateway=3.3.3.3 routing-table=mytable

v7 correct example code

/routing table
add fib name=mytable

/ip route
add dst-address=3.3.3.3/32 gateway=3.3.3.3 routing-table=mytable

/routing rule
add dst-address=3.3.3.3/32 table=mytable
 
User avatar
Filo
newbie
Topic Author
Posts: 42
Joined: Thu Jan 13, 2022 2:37 pm
Location: Germany

Re: Simpler Failover for two Gateways I found working

Wed Jul 24, 2024 11:46 am

It is always wrong to use the firewall to decide the routes (except in exceptional cases).

For routes must be used... routes...

Hi - probably.
The "mangle"-rule in my initial approach at the top of this thread was designed for the netwatch - and yes, it is doing it's job the way a routing-entry would do it. At the time creating the rule I had zero experience with routing and this was my first approach in my home environment.

Surprisingly all other approaches to create a backup / failover were much more complex at this time (you can see it in the threads at this time here in the BBoard) - that's why I came to the idea to involve netwatch with this mangle. Did not see this approach documented before and posted it here.

Will alter the config for the routing part since you are of course right with your statement, although the first idea did work since I posted it very well.

Greetings!

*Edit*: And SORRY, if I missed this thread, I saw the other answers but had been abscent for a while...
 
User avatar
Filo
newbie
Topic Author
Posts: 42
Joined: Thu Jan 13, 2022 2:37 pm
Location: Germany

Re: Simpler Failover for two Gateways I found working

Wed Jul 24, 2024 12:08 pm

I want to do the same (5G + DSL-Failover).

- Do you have your box in bridge or router mode?
- What cable do you connect on what port on your box?
Sorry for the late reply, @derolf - here's the answer:

My RB5009 is in BRIDGE-Mode. All ports are bridged together, no other IP-Segment is used.
So since ALL Ports are on the same bridge and you like to rebuild this, you are free to use any port of your MikroTik-Device for that.

In my case I have a mixed-setup with AVM-Hardware ("FritzBox"). The "FritzBox" is providing DSL / Landline and attached to another "FritzBox" in my upstairs location (MESH-Wireless) there's the LTE-Modem (192.168.1.250 as a normal LAN-Address).

You see, everything in the network is seeing everything (since this is capsulated from the guest-network and IOT which "FritzBox" is providing), no need for additional internal firewall-rules, so BRIDGE is fine here.

If you setup everything like this (one subnet, one bridge) you'll be fine with the rest of settings I mentioned in the first thread and your failover is "good-to-go" :)

Hope this helps!
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 12438
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Simpler Failover for two Gateways I found working

Wed Jul 24, 2024 1:30 pm

(Reply to #15)

In fact, I didn't comment on the rest, because there was nothing to add.
I'm usually very critical (not by chance, but always explaining the reasons), and if I haven't added anything else, it means you did a good job (and I thank you for putting it on the forum).

I just explained how things should be done, that's the purpose of the forum.
 
User avatar
Filo
newbie
Topic Author
Posts: 42
Joined: Thu Jan 13, 2022 2:37 pm
Location: Germany

Re: Simpler Failover for two Gateways I found working

Wed Jul 24, 2024 1:48 pm

I just explained how things should be done, that's the purpose of the forum.
Felt not offended - we're all here to learn. And usually this board is a good example of respecting every stage of knowledge and diving into each others' problems.
If this thread has a wholesome solution at the end, this work is perfectly done 8)

Cheers!
 
jaclaz
Forum Guru
Forum Guru
Posts: 1724
Joined: Tue Oct 03, 2023 4:21 pm

Re: Simpler Failover for two Gateways I found working

Tue Oct 08, 2024 3:25 pm

Only to keep things as together as possible I just "sold" this Filo's approach to a new user, with a few changes.
I got rid of the separate routing table and of the mangle by adding a "narrow" /32 route to the "canary" ip address in "main" table.
And I didn't use the "comment" as selector in the Netwatch script (this is a pet peeve of mine, comments may be changed accidentally six months or a year later, the setup would stop working and finding out what happened would be more difficult).
Because of *some reasons* (I suspect the address on ether1 coming from DHCP server instead of being static) when the ether1 is physically disconnected from the ISP router (think of the ethernet cable going bad or just the ISP router or its power supply failing) the "main" route becomes inactive, and the whole setup starts flapping each time the netwatch script runs.
So I added a blackhole route to the same /32 address with distance 2.

The thread is here:
viewtopic.php?t=211432

As it is a bit difficult to follow due to all the tests made, here it is the overall setup using the SAME IP addresses and structure of the original Filo's post:
1. Prerequirements:
- Network with DHCP done by MicroTik (in this case: 192.188.1.0/24)
- Standard Gateway in DHCP will be the MikroTik (here: 192.188.1.1)
- Internet available at (for Example) 192.168.1.1 (in this case DSL)
- Internet available at (for Example) 192.168.1.250 (LTE-Modem)
- Both interfaces connected to the two devices above characterized as WAN in interface list and masqueraded in /ip firewall nat

2. Routing:
- Standard Route 0.0.0.0/0 set to 192.168.1.250 with Distance 1 comment=LTE-Failover -> (keep it DEACTIVATED)
- Standard Route 0.0.0.0/0 set to 192.168.1.1 with Distance 2
- Narrow Route 8.8.4.4/32 set to 192.168.1.1 with Distance 1
- Narrow Blackhole route 8.8.4.4/32 with Distance 2


3. Go to ROUTING -> TABLES
- Create a Routing Table named (for Example) "DSL" - check FIB

4. Go To IP -> ROUTES -> Click +
- Dst,Address: 0.0.0.0/0
- Gateway: 192.168.1.1 (your Primary Gateway)
- Routing Table: Select above created ROUTING TABLE (here: "DSL")

5. Go to IP -> FIREWALL -> Tab MANGLE
Create a MANGLE-Rule:
- Tab -> GENERAL
-- Chain: output
-- Dst.Address: 8.8.8.8
-- Protocol: 1 (icmp)
- Tab -> ACTION
-- Action: mark routing
-- New Routing Mark: Select above created ROUTING TABLE (here: "DSL")


6. 3.Go to TOOLS -> NETWATCH
-Tab -> HOST
-- Create a Netwatch Host:
--- Host: 8.8.8.8 8.8.4.4
--- Type: icmp
--- Interval: 00:00:30
--- Timeout: 5.00

-Tab -> Down
/ip route enable [find comment=LTE-Failover] [find dst-address=0.0.0.0/0 and gateway=192.168.1.250]

-Tab -> Up
/ip route disable [find comment=LTE-Failover] [find dst-address=0.0.0.0/0 and gateway=192.168.1.250]

It seems like it works nicely and it is simpler to implement.

EDIT: added the detail that interfaces should be WAN and masqueraded
Last edited by jaclaz on Thu Oct 10, 2024 11:07 am, edited 1 time in total.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Tue Oct 08, 2024 10:20 pm

[quote=jaclaz post_id=1102129 time=1728390315 user_id=224177

-Tab -> Down
/ip route enable [find comment=LTE-Failover] [find dst-address=0.0.0.0/0 and gateway=192.168.1.250]

-Tab -> Up
/ip route disable [find comment=LTE-Failover] [find dst-address=0.0.0.0/0 and gateway=192.168.1.250]
[/quote]
Just to be sure on both TAB up and TAB down, the router ends up pointing to the same gateway ????

My bad I see you differentiate by enable and disable..

The problem I am having is how do you associate netwatch to the correct ROUTE????
Just identifying the gateway is good enough? but surelly you mean for static gateways or even pppoe name, but what about dynamic gateways??
 
jaclaz
Forum Guru
Forum Guru
Posts: 1724
Joined: Tue Oct 03, 2023 4:21 pm

Re: Simpler Failover for two Gateways I found working

Wed Oct 09, 2024 12:27 am

I am not sure to understand, the netwatch Is associated to a ping address (8.8.4.4) in this example.
To get there, there Is one /32 route (going through the "main" DSL connection.
If 8.8.4.4 is reachable, the LTE route is disabled, if It Is not the LTE router Is enabled and takes precedence because of the lower distances.
For the case where the above /32 route should become invalid there Is the blackhole One preventing the ping to 8.8.4.4 to go through the LTE connection.
The assumption (original requirements) Is that the gateways are static or as in the linked to practical case, dynamic (in the sense of coming from DHCP) but known.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Wed Oct 09, 2024 12:53 am

Well to help me understand I have created a recursive ruleset and a netwatch ruleset for the basic setup of TWO WANS.
Most LAN users (single subnet /23) should use WAN1.
Rest of users identified by firewall address list should use WAN2
Each WAN should be used as backup of the other.

See result recursive: viewtopic.php?t=211555#p1102221
see result netwatch: viewtopic.php?p=1102233#p1102233

You will note I have also avoided using comments approach on this example!
 
jaclaz
Forum Guru
Forum Guru
Posts: 1724
Joined: Tue Oct 03, 2023 4:21 pm

Re: Simpler Failover for two Gateways I found working

Wed Oct 09, 2024 11:47 am

Interesting approach :) , I like the idea of the "cross backup" for this case of two different sources (home and business traffic).

Now next step would be IMHO to see if it is the case in such or similar setups, to add to the netwatch up and down scripts the "reset existing connections script", kindly provided by rextended:
viewtopic.php?t=103812
viewtopic.php?t=103812#p977354
/ip firewall connection
:foreach idc in=[find where timeout>60] do={
 remove [find where .id=$idc]
}
this should make the transition from one connection to the other faster, but I wonder if it has any drawback or causes any collateral damage.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Wed Oct 09, 2024 11:22 pm

So is your thinking that when one connection goes down for whatever reason, the netwatch setup is very good in terms of detecting and switching the users to the backup,
however the problem is users sessions interrupted mid-stream are left hanging?

Isnt that partially taken care of by using masquerade on source nat vice action=srcnat ( for a fixed static IP?)..

In any case, yes if part of the goal is to get rid of interrupted sessions, then I see where you are coming from on Rextendeds script
 
jaclaz
Forum Guru
Forum Guru
Posts: 1724
Joined: Tue Oct 03, 2023 4:21 pm

Re: Simpler Failover for two Gateways I found working

Wed Oct 09, 2024 11:55 pm

I have still not fully grasped the implications, in practice, that can make one prefer masquerade over src-nat or viceversa, but yes masquerade should be able to take care - at least partially - of the existing/interrupted connections, but for static settings I understood that src-nat was to be preferred, and that existing connections had then to be taken care of by *something else*.
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 4089
Joined: Sun May 01, 2016 7:12 pm
Location: California
Contact:

Re: Simpler Failover for two Gateways I found working

Thu Oct 10, 2024 12:22 am

Isnt that partially taken care of by using masquerade on source nat vice action=srcnat ( for a fixed static IP?)..
@anav is point is right, masquerade does a lot of heavy-lifting without any more config. IMO, if you want simple... don't mess with connections or trying optimize failover for ALL traffic.

At end of the day... it's how the server and client protocol behaves that matters (L3/L4 but ALSO the app-level logic too). Nearly all apps have some "timeout" & after that most try to reconnect, so that when failover can happen, for that particular app/protocol. Not apps do same things, but generally you see timeouts of 30 seconds – so that's how long it take to recover. But each protocol/app/etc be slightly different.

If you notice some app/protocol that does not response well to failover...you can apply the right tricks to fix it. There are NO magic 3 lines to recover in all cases from the ISP/gateway being changed in some protocol's session....

I like the idea of the "cross backup" for this case of two different sources (home and business traffic)
[...]
/ip firewall connection
:foreach idc in=[find where timeout>60] do={
 remove [find where .id=$idc]
}
this should make the transition from one connection to the other faster, but I wonder if it has any drawback or causes any collateral damage.
If one does want to go down the road at attempting to speed recovery times, you need packet/connection marks & action=reject reject-with= filter rules based on WAN_X does not match.

And you certainly want to send a "reject-with=tcp-reset" for TCP in /ip/firewall/filter if you want to speed TCP recovery - which is most web traffic. UDP do not really have sessions, so some UDP-based protocols really don't care...other UDP things may be part of more complex protocols like SIP (specifically ALG considerations) or "games" that employ their own connection schemes, and may care about gateway IP internal to their protocols.

But the idea here was "Simpler Failover", complex filter and mange rules kinda go against that. Even if you employ every trick you could, you're not going to solve all cases. Some stuff just requires re-auth or whatever if traffic is using a new IP, and no RouterOS script can fix that (without even more complexity like tunneling to cloud to maintain a common public IP)

Why I'd just recommend testing failover on some common app you/family/company actually use to see how those things respond as first step.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Thu Oct 10, 2024 12:27 am

Nice feedback AMMO. Conceptually, I like the idea.
basic failover, - first step when learning how to use two wans
Recursive failover - If concerned that the ISP is flaky and want to confirm connectivity to WWW ( seems many do )
Netwatch failover - If not happy with 20 seconds before any action is taken, netwatch allows much more granularity in speed and signal degradation before pulling the plug on a connection.
Rextended Script- If you need absolutely to quickly get rid of HUNG connections as fast as possible ( although advantage over masquerade is unknown/quantified )...

Does that sound about right??
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 4089
Joined: Sun May 01, 2016 7:12 pm
Location: California
Contact:

Re: Simpler Failover for two Gateways I found working

Thu Oct 10, 2024 1:19 am

There is an "option 0 or ♾️"... you can just unplug a WAN cable to cause failover.... if distance=1 on 1st WAN & distance=2 on 2nd WAN default routes... This actually the default on LTE devices (dhcp-client on WAN use default-route-distance=1 & LTE APN uses default-route-distance=2). I note that since that's often a quick way to induce failover if any of the above approaches fails.

And back @anav's rules: we come back to... what problem are you trying to solve?

So I'd add there app/protocol-specific things to do - semi-outside the Mikrotik. I'll give a quick example: a "mission-critical" Zoom call to provide context.

Zoom does a lot internally to try to keep the connection (SIP ICE/STUN/TURN plus their own). But Zoom view of the potential WAN options is hidden behind failover, since the Zoom client cannot possible know of that backup route on the router. And regardless of what you do in RouterOS, ANY failover scheme on the Mikrotik is going to be noticeable this "mission-critical" call. Now seconds may matter... so if this a common case... your problems go beyond failover as you'd want to look at connection/packet marking/QoS/BGP/etc

One simply solution to this problem for VLAN aficionado, is some hybrid ports to the client for multiwan scheme (or, could be server, NVR, etc). It's relatively simple to create a new client VLAN that always route to 2nd/backup internet via routing rules. So if /interface/bridge vlan-filtering=yes does not scare, it's easy to create a hybrid port that does desktop/laptop. Windows is bit trickier with hybrid port, but on Mac/Linux it's easier. Apple even documents Set Up a VLAN on Mac. Basically that allows something like Zoom to "see" both networks & it will likely use BOTH to keep the link up from the start of call. If primary did switch over to 2nd, well, the Zoom app was already connect to it well before the failover. And other apps/devices with network interface selections can do similar.

So a 2nd VLAN that goes to 2nd WAN may be BETTER approach than any complex failover scheme. In any app that can take advantage of the desktop/laptop/PC having TWO internet connections, you can offload these complex things to that. Stuff like Camera NVR often come with two internet ports, so in some cases it may just running a 2nd cable.

So having some additional client VLAN that always goes to a 2nd WAN is handy option to have pre-configured, even if not used....may be useful for testing of WAN independently.
Last edited by Amm0 on Thu Oct 10, 2024 1:29 am, edited 1 time in total.
 
jaclaz
Forum Guru
Forum Guru
Posts: 1724
Joined: Tue Oct 03, 2023 4:21 pm

Re: Simpler Failover for two Gateways I found working

Thu Oct 10, 2024 1:29 am

@Amm0

I guess that the doubts (my ones) are only:
1) is masquerade preferrable also on static setup ( because of the way It handles inherently the existing connections in case of failover)
2) or is src-nat "better" and it can be integrated by a simple ( like rextended's one, a few lines) script added in the netwatch up and down scripts

As I see it #2 is not at all "complex" and, should It be actually "better" than #1 It wouldn't change the "simple" or "simpler" characterization of this approach.
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 4089
Joined: Sun May 01, 2016 7:12 pm
Location: California
Contact:

Re: Simpler Failover for two Gateways I found working

Thu Oct 10, 2024 2:19 am

1) is masquerade preferrable also on static setup ( because of the way It handles inherently the existing connections in case of failover)
The topic is well-described these days: https://help.mikrotik.com/docs/display/ ... Masquerade
.. Every time when interface disconnects and/or its IP address changes, the router will clear all masqueraded connection tracking entries related to the interface, this way improving system recovery time after public IP change. If srcnat is used instead of masquerade, connection tracking entries remain and connections can simply resume after a link failure.
If you want "unplug for failover", you need masquerade. Stuff like LTE always will generally disconnect, so you'd want masquerade there too. Now recursive route induced failover is not a "disconnect", so masquerade won't help to clear connections. And src-nat does not ever clear connections. Why I do not think recursive routes should be involved in "simpler failover".

I still prefer masquerade – just avoid making two config changes if an IP address changes, so less chance of config error cause on an outage. I do create a masquerade rule per WAN, and then leave the default out-interface-list=WAN below the WAN-specifc masquerade rules. Also as docs note "masquerade" handles the unplug/disable sitution so that can also cause a flush of connections, so that make that operation easier if needed, without a script. Just "/interface ethernet { disable ether1; enable ether1 }" & that flush be specific to the WAN if the NAT rule was (AFAIK) .

Now, src-nat has its place. Again as docs note "connections can simply resume after a link failure". So if the "failover case" is measured like 15-30 seconds, masquerade's clearing would be bad. Which is why the docs have several paragraphs explaining how this works. A good example be starlink – which uses DHCP in all cases – so masquerade be the default/automatic. But, for example, starlinks app's charts often do show very short outages (few seconds). Most be well under "check-gateway"'s thresholds, so no failover be triggered. But...with some impairment (trees) between dishy and its birds... you very likely see short outages in range of seconds to minutes, every few hours. With these predicable short outage causing failover (only to shortly recover), certainly a more complex /ip/dhcp-client script that updates a /ip/firewall/nat src-nat entry might be preferable over masquerade. Similar to point on looking outside Mikrotik to fix these things – you can also move the starlink in this example, or cut down some trees...
Last edited by Amm0 on Thu Oct 10, 2024 2:21 am, edited 1 time in total.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 21226
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Simpler Failover for two Gateways I found working

Thu Oct 10, 2024 2:21 am

Separate from jaclaz question, why not consider VRRP as a way to create a seamless connection from the client (lan) perspective, or will not work in your zooom example. ????
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 4089
Joined: Sun May 01, 2016 7:12 pm
Location: California
Contact:

Re: Simpler Failover for two Gateways I found working

Thu Oct 10, 2024 2:35 am

Separate from jaclaz question, why not consider VRRP as a way to create a seamless connection from the client (lan) perspective, or will not work in your zooom example. ????
VRRP only help if you had TWO routers. And, for example, someone tripping one routers power/other cable, or your doing up updage/config change. But does not really help with WAN network failure (outside more ISP situtions).

My main point was some applications/protocols, like Zoom, will actually look at the computer to see if has multiple WAN, on the computer itself. If it does, the app/protocol may try to use them. Zoom be one example. So the out-of-config solution is, essentially run a "2nd cable" that always goes to the 2nd WAN.

Other protocols like RTMP used in livestream have similar needs where it actually better if the client/end-user/server was "pre-connected" to 2nd WAN. RTMP is used to send live video to YouTube, and offers a "backup inject". So if the backup inject actually used a different path before the livestream, the protocol handle the failover. RTMP could also be handle by mangle rules as well to specifically direct the "backup URL" to a different WAN using address-list/etc.

Just trying to highligh a one-size-fits-all is tough. Keeping failover simple initially is the best plan... add as needed.
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 4089
Joined: Sun May 01, 2016 7:12 pm
Location: California
Contact:

Re: Simpler Failover for two Gateways I found working

Thu Oct 10, 2024 2:49 am

Note I'm mentioning more "realtime" things.... These do get tricky on what to do... And some of these protocols have built-in mechanism for failover... so using the protocol-specific failover thing be better, than complex RouterOS config.

Most traffic is web traffic, generally speaking. So most issues after a failover can be solve by hitting "refresh" in the browser and/or restarting an app. So for these case you can do too much to avoid some users hitting ^R in hopefully occasional failover case.

And it's easy to make a more complex config that then creates new outages in the process — even though the internet may actually be working.

Who is online

Users browsing this forum: steamy and 11 guests