Page 1 of 1

Dual wan fail over, fail back not working

Posted: Mon Jan 14, 2019 8:35 am
by driv3l
I am new to Mikrotik having just purchased an RB4011.

I have it up and running with my primary Internet line without any issues.

I am trying to get dual wan fail over working (not load balancing, as my backup line isn't as fast as my primary).

After much reading, I have managed to get the basic fail over working. I did this by adding a new route with distance 2, and enabled gateway checking. I also added the firewall entries for wan2 (copied the default wan1 entries that were created when the router automatically setup the configuration for the primary wan).

The primary wan works fine. When I disable the port for testing, it does fail over to the secondary wan.

The problem is, when I re-enable wan1, the Internet stops working, and it does not seem to come back until I disable the port for wan2, which then forces wan1 to work again.

Any suggestions on what might be the issue?

Please note that I am extremely new to Mikrotik (and advanced routing in general), and everything I have been doing Is via the gui / winbox.

Thanks.

Re: Dual wan fail over, fail back not working

Posted: Tue Jan 15, 2019 6:24 pm
by tricksol
If you have the check gateway ping enabled it should fail over if the gateway goes down. Also on the Firewall > Nat > you need to make sure you have masquerading for the second wan port.

Re: Dual wan fail over, fail back not working

Posted: Tue Jan 15, 2019 7:55 pm
by sebastia
That's normal consequence of masq & fail-over. When your primary comes back, existing connections gets routed over primary, but connection state is still linked to secondary. This results in masquerade not being applied, and leakage of private ip's to ISP.

By manually disabling wan2, these connections states get cleared and new created with proper masquerading.

See slide 28 onwards: https://mum.mikrotik.com/presentations/ ... 639302.pdf

Re: Dual wan fail over, fail back not working

Posted: Tue Jan 15, 2019 8:39 pm
by anav
So Sebastia, what do you recommend if the Authors Two WANIPs are dynamic/////////////??
typical recursive setup.............

/ip route
add check-gateway=ping distance=2 gateway=8.8.4.4
add distance=2 dst-address=8.8.4.4/32 gateway=DynamicFiberGateway scope=10 (primary)
add distance=3 gateway=DynamicCableGateway (secondary)

Would this be prone to leakage and if so what is the solution??

Re: Dual wan fail over, fail back not working

Posted: Tue Jan 15, 2019 8:55 pm
by draid
That's normal consequence of masq & fail-over. When your primary comes back, existing connections gets routed over primary, but connection state is still linked to secondary. This results in masquerade not being applied, and leakage of private ip's to ISP.

By manually disabling wan2, these connections states get cleared and new created with proper masquerading.

See slide 28 onwards: https://mum.mikrotik.com/presentations/ ... 639302.pdf
I'm using a recursice failover with pppoe and static address + masquarade. It is working perfectly but as I saw this presentation it seems that this is far from a good practice. Is this leakage happening with a certainty and how to avoid it?

I'm currently using Multiple host checking per Uplink with additional scripting due to the one of my ISPs. It could help the author too.
https://wiki.mikrotik.com/wiki/Advanced ... _Scripting

Re: Dual wan fail over, fail back not working

Posted: Tue Jan 15, 2019 10:23 pm
by sebastia
The associated video: https://www.youtube.com/watch?v=3LmQYIQ5RoA

the Internet stops working, and it does not seem to come back until I disable the port for wan2
It would start to work on its own, after tcp connections have timed-out...

Possible safe-guards were already given in the presentation:
* Use action=src-nat instead of action=masquerade where it is possible
self explanatory

* Drop connection-state=invalid packets
I have that rule in every chain of filter table (especially forward, after routing)

* Drop connection-state=new connection-nat-state=!dstnat packets from public interface
to be honest, not sure how this is related to fail-over, as in my opinion that is just to filter spoofed traffic

* Creating backup “blackhole” route for each routing-mark
basically pinning a connection to a route

Re: Dual wan fail over, fail back not working

Posted: Wed Jan 16, 2019 1:44 am
by driv3l
The associated video: https://www.youtube.com/watch?v=3LmQYIQ5RoA

the Internet stops working, and it does not seem to come back until I disable the port for wan2
It would start to work on its own, after tcp connections have timed-out...

Possible safe-guards were already given in the presentation:
* Use action=src-nat instead of action=masquerade where it is possible
self explanatory

* Drop connection-state=invalid packets
I have that rule in every chain of filter table (especially forward, after routing)

* Drop connection-state=new connection-nat-state=!dstnat packets from public interface
to be honest, not sure how this is related to fail-over, as in my opinion that is just to filter spoofed traffic

* Creating backup “blackhole” route for each routing-mark
basically pinning a connection to a route

Thanks for the details on this. So should I be using action=src-nat for both WAN entries?

Can you explain the "Creating backup “blackhole” route for each routing-mark
basically pinning a connection to a route" part? I am new to this so am not sure how to go about doing this.

Thanks!

Re: Dual wan fail over, fail back not working

Posted: Wed Jan 16, 2019 2:06 pm
by sebastia
So should I be using action=src-nat for both WAN entries?
That depends on how stable the assigned ip is. One ISP I use, assigns ip for 24h and allows extensions, so from my point of view that is practically static, and for that config I use src-nat. (That's my primary by the way). The backup line I have (over 4G) assigns ips "at random", and there I use masquerade.

Can you explain the "Creating backup “blackhole” route for each routing-mark
(this is only for IPv4, IPv6 doesn't support policy based routing yet)
The normal routing logic for route-mark-ing is to do a lookup in the designated table first. If no valid route has been found, main table will be consulted next. So practically this means: try specific table first, if it's unavailable use normal routing.

This can be prevented / limited by either:
* using routing rule with action=lookup-only-in-table limiting lookups to within the specified table only (so no lookups in default table)
* adding black-hole routing entry, ex: "add distance=100 routing-mark=wan type=blackhole" (distance needs to be high enough for the entry to be processed last). So if the normal route in that table is unavailable (link down) route lookup with end with the black-hole, and packet will be dropped. Another option here is type=unreachable.

Re: Dual wan fail over, fail back not working

Posted: Wed Jan 16, 2019 6:23 pm
by anav
My IPs are dynamic but really dont change unless I renew release or change routers etc............... So basically static for the most part. I imagine if they did a system change at their end that may also cause some IP modifications. I will change mine to src soon.

Re: Dual wan fail over, fail back not working

Posted: Sat May 16, 2020 1:23 pm
by Guscht
The normal routing logic for route-mark-ing is to do a lookup in the designated table first. If no valid route has been found, main table will be consulted next. So practically this means: try specific table first, if it's unavailable use normal routing.

This can be prevented / limited by either:
* using routing rule with action=lookup-only-in-table limiting lookups to within the specified table only (so no lookups in default table)
* adding black-hole routing entry, ex: "add distance=100 routing-mark=wan type=blackhole" (distance needs to be high enough for the entry to be processed last). So if the normal route in that table is unavailable (link down) route lookup with end with the black-hole, and packet will be dropped. Another option here is type=unreachable.
But by setting thigs up as you describe, this will prevent failover?
I think its a good thing, if the specfic table is down, going to the next main table will result in constant conncectivity? Or Im wrong?