Page 1 of 1

Attempting to evolve from caveman's failover

Posted: Tue Oct 03, 2023 6:00 pm
by jaclaz
Hello to all.

Completely new to Mikrotik RouterOS, and with a far from complete understanding of networking (so please be gentle if I ask too simple/basic questions and/or I use some incorrect terminology).
I spent some time in the last few days reading many forum posts and trying to have a basic understanding of the capabilities of the RouterOS, after - while looking for a possible better solution to my current failover setup (none or manual/caveman) - I learned from posts and links by Sob on this thread:
viewtopic.php?t=187178
how it is seemingly possible through some RouterOS "magic" with nat/firewall/mangle to have more than one device with the same IP connected to the network.

My current situation (small office/shop operation), is connection to the internet with three ISP providers, all of them provide their own (proprietary/locked) modem/router, so all of them (each of them is a gateway) have 192.168.1.1/255.255.255.0 IP address, to the network are connected a bunch of the usual stuff, a few PC's, printers and a few other "proprietary" devices (POS and POS-like).

So I have all the devices on a 192.168.1.x/255.255.255.0 LAN with gateway pointing to 192.168.1.1, no DHCP (all devices have static IP's), no VLANS or similar "advanced" routing/switching.

The Internet connections are as follows:
1) Primary: FTTC (vdsl)
2) Secondary: "old" aDSL
3) Tertiary: FWA (Lte 3G/4G)

Only one of the three routers/modems is physically connected to the network at any given time.

The primary connection is usually very stable and gives no problems, but (it happened three times this year) we had faults on the ISP side, leaving us without connection on the primary for one or two days each time, in two cases the secondary worked, in one case both primary and secondary were down (copper cables cut during some road works) and we had to use the tertiary.

So right now my "caveman" method of failover is to simply unplug the rj45 (coming out of the "main switch") from the back of the modem/router that has no connection and insert it in one of the other modem/router, power this latter on and see that internet works.

I have no access to the settings of the three modem/routers (I can change them only calling the ISP assistance and they change settings remotely, but it is not "fast", it takes from a couple hours up to one day or more), and as well changing the settings on the POS-like devices needs a call to other assistance services (three different ones), but even if I had access to these settings (like I have for the PC's and printers) it would take time and need some (basic) knowledge that other pwople/colleagues simply miss, while the unplugging and re-plugging can be done by everyone.

For the reasons above, we must assume that this 192.168.1.1 is carved in stone and cannot be changed.

It would be needed a (hypothetical) device, that could act (still manually) as a RJ45 physical switcher box A/B/C/D similar to:
http://www.cablesonline.com/abrjswitbox3.html
but that could be instead automated via some ping (recursive) or netwatch or similar, or some other script running on a PC.
I actually found something (loosely) similar, a switcher box that can be piloted via RS232 but that besides not being exactly cheap (US$290), would add a whole new level of complication:
https://www.vpi.us/network-devices/giga ... witch-1044

I think (but may well be wrong) that if I introduce between the "main" LAN switch and the three modem/routers two Mikrotik routers (possibly RB750GR3?) I can do the following:

1) have the first router get the 192.168.1.1 address on the LAN and (say) 172.16.0.1 on the WAN and have some scripts/netwatch to connect to main/failover1/failover2 to three addresses like 172.16.0.10, 172.16.0.20, 172.16.0.30

2) have the second router be 172.16.0.2 on the Wan (that exists only between the two routers) and mapping/natting the three 192.168.1.1 fixed address modem routers to the three addresses 172.16.0.10, 172.16.0.20, 172.16.0.30

If any of the two Mikrotik routers fail (or both), I can still use the old method of unplugging and plugging directly the modem/router to the switch, bypassing the Mikrotik routers completely.

Or (another idea, maybe folly) I could have 4 devices, 1 Mikrotik like the first router above and three other (any) small routers one of each of the cables connecting each modem/router, simply routing from 172.16.0.10 to 192.168.1.1, from 172.16.0.20 to 192.168.1.1 and from 172.16.0.30 to 192.168.1.1

Do I make sense? (or am I completely off and better/easier solutions exist)

If the approach can work (or if any other suggested ones works) then I will probably need some help in choosing the right hardware/routers and configuring the whole stuff.

Thanks in advance for any reply/suggestion to solve the problem.

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Wed Oct 04, 2023 11:26 am
by Filo
Hi,
from what I understood:

- Clients have FIXED IPs (and FIXED Gateway-IPs)
- Three Internet-Gateways are in place, all of them with DIFFERENT IPs
- Caveman is plugging cables when failover should happen

I would totally redesign this network in order to:

- If DHCP is not possible, use MIKROTIK-Router-IP as Gateway
- Attach ALL Internet-Gateways and clients to MikroTik, they need valid IPs in your subnet
- Create the failover-logic on your MikroTik for those three gateways

The last point is more generic, since it depends on how detailed you're in topic and want to be.
The easiest way to create a failover would be to create three standard routes (0.0.0.0/0) with different routing costs / distance.
The lowest distance count route will be used unless it is unreachable.

This means: The GATEWAY needs to be unreachable / turned off in order to get the next distance as the active route.
But you would also be able to ALTER the routing distance to make an alternative route active, instead of (un-)plugging cables.

If you want to make sure, alternative (say: "backup"-) routes become active automatically, you would need to go into the "recursive routing for failover"-tutorial (which you can find here: https://help.mikrotik.com/docs/pages/vi ... d=26476608 ) or use another approach involving NETWATCH and MANGLE-Rules for firewall I once mentioned here: viewtopic.php?t=198999

As this being said - it's only meant as an example. We do not know your configuration for sure and there may also be some other things to consider.

Hope this will help you evolve from caveman to network-hero :)


*edit*
As I read later on - ALL Internet-Gateways have 192.168.1.1 and you're not able to change? The above solutions require each gateway to have a unique IP in your network and having the MikroTik to have the 192.168.1.1

If it is the case that ALL GATEWAYS have 1.1. and you are not able to change the interal addresses you can solve this manually:
- Connect all Gateways to a MikroTik
- DISABLE LAN-Ports of GW2 and GW3 (better do that before Step1 ;) )
- In case of failing Internet: DISABLE LAN-Port of GW1 and ENABLE GW2 or GW3
- No need to create differnt standard-routes (since 192.168.1.1 will be the way to go either Gateway used)

Of course, this could also be scripted with MANGLE-Rules and NETWATCH-Tool on a MikroTik, as well. Automatic Failover is quite easy to do (Netwatch, Probe 8.8.8.8 for Ping, if not responding, deactivate Port for GW1, activate Port for GW2, probe again, if not responding in a given time, deactivate Port for GW2, activate Port for GW3...), automatic failback is another story since we're not able to probe the preferred route for connectivity. For this you would surely need to change the internal addresses of the Gateways.

That brings me to a rarely stupid workaround...
IF you have access to the Gateways and IF you are able to create a DynDNS-Host for them to... let's say probe them "from outside" with DynDNS or equal, the MikroTik would be able to check every Internet-Connection with the corresponding DynDNS-Host from inside. You'll need to make sure to refresh DNS-Cache if your Internet-IPs are subject to change. If you have FIXED-IPs, great, probe them. But with this idea, it would be possible to create a failback if GW1 is responding to DynDNS again.

Best regards,
Martin!

Re: Attempting to evolve from caveman's failover

Posted: Wed Oct 04, 2023 2:34 pm
by jaclaz
Yes, the gateways have "fixed" IP's (192.168.1.1).

The thread and links I found with the examples by Sob essentially use a Mikrotik router to have three "same IP" devices connected but "translated" to three different interfaces/IP's, the context in those threads posts is different, it is about people having industrial machines (or radios/whatever) that have fixed IP's from factory, in that context I believe that there are no issues whatsoever as the requirement/goal is to connect to more than one machines with the same IP address, I don't think there is any issue with capacity/bandwidth over those links.

My scenario is different because the idea is to nat/masquerade the gateways(s) IP's and maybe, even if possible in theory in practice there will be issues of some kind with the connection (double natting? reduced bandwidth? something else?), not that any of the current gateways are any good when it comes to speed, the main one is about 30 Mbit/s, and the two spare ones are around 20 Mbit/s.

Besides, from what I understand of network/routers (but I am not at all an expert) in my case I cannot do everything on a same router, and I need either 2 mikrotik routers, one "replacing" the gateway an having 192.168.1.1 with a route for 0.0.0.0 pointing towards the second router (with the three current gateways at 192.168.1.1 *somehow* natted/masqueraded to three different IP addresses or have one small/simple router translating/relaying/routing/whatever for each one of the current gateways.

Thank you for the Netwatch reference and your modified way, if the overall approach makes sense, then I will test/study these examples.

As said I could (a one time task) make the ISP change the three 192.168.1.1 to (say) 192.168.2.1, 192.168.3.1 and 192.168.4.1 and bring the setup to a "normal" one where the single Mikrotik router can manage them set as gateway with 192.168.1.1, but what would happen if the Mikrotik router fails? I would have to bypass it and change manually the gateway on all devices connected to the network (which I can do for PC's but that I have problems with the POS-like devices) but even if I could directly change the gateway on the POS-like devices it would remain the issue if I am not there, there is noone I can trust to make these changes and anyway it would take some time.

I could do this one time change and have two identical Mikrotik routers setup in an identical way, only one connected, and in case of failure of the connected one, replace it with the second. I have some experience with this approach with a self-made router running zeroshell (a now discontinued Linux router/firewall distro) on a repurposed thin client, but when I need to "switch" the main thing client with the second one I have anyway to disconnect and reconnect several cables).

I will have to think if it is physically possible to setup the cables "in parallel" so that all that would be needed in this case would be to power off the "main" (now failed) Mikrotik and power on the "spare" one.

Another approach I thought of (probably foolish) could be to find a reliable ethernet relay that the netwatch (or similar) script on the Mikrotik can *somehow* pilot, powering off and on the three gateways (this has an issue, as these stupid ISP provided modem/routers are very slow at booting, particularly one so I would need to keep them powered on and use the ethernet relay to switch on/off three small switches (hubs) between the Mikrotik and each gateway) and there is a physical problem as the LTE router/modem/gateway is not (at the moment) near the other two.

A further complication (I am throwing on the table anything I can think of) would be a set of "smart plugs", still given somehow the Mikrotik script can pilot them), but it seems that they are all wireless and most of them have a proprietary app, though there are a few using Tasmota that is open source and "local" and possibly can be triggered by commands set over the wireless network.
Given the (I believe poor) reliability of these devices and the hypothetical Wi-Fi network I already discarded this approach as not suitable in practice.

Any other idea/?

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Thu Oct 05, 2023 8:21 am
by Filo
Well... to be honest, the most simple solution for you is to use the manual switcher you already mentioned.
There are situtations where you try to solve a problem with technical overkill or you keep it simple and doable for everyone.

If this is a remote location you don't want to visit very often, I would go for an unmanaged switch and the manual switcher.
EVERYONE is able to switch from A to B or B to C. Plus: Only two devices could fail. Leave them on spare at the location, label the cables, another caveman can jump in and replace it.

There ARE indeed solutions which you may be able to implement. Automatic failover, VRRT on MikroTik, and many more options to create an administrative nightmare for everyone which is not you or "brainlinked" with you.

Is it worth it or are you just trying to make a simple and easy setup for everyone to understand to lets you get away in peace?
Take a step back from your considerations and try to look at it from outside. Such projects can kill you over time and often it is better to revert complexity to a working setup for this special location.

If you can't use standard-failover procedures mentioned in MikroTik-Tutorials due to the given limitations - don't bend it over. Use the Hardware-Switch for the gateways and everyone will understand your 1-Page-Documentation on how to fix it, while you're in the sun having a beer :)

*edit*
I guess there are also such switchers on the market which may be remote-controllable via VPN (for FAILBACK, if they forget to do it)

Best regards,
Martin!

Re: Attempting to evolve from caveman's failover

Posted: Thu Oct 05, 2023 3:17 pm
by jaclaz
Well, I found the given "switcher box" that can be driven both manually (push button) and programmatically (via RS232), "Manual Ethernet Switch":
https://www.vpi.us/network-devices/giga ... witch-1044
but it is (as I see it) stupidly expensive (at nearly 300 US$) and - being RS232 - would need an added module ethernet<-> RS232 (likely another 50-100 US$) but - besides the cost, I am not even sure how well it can work.

With the same kind of money you can buy two more than decent Mikrotik routers or switches.

As an outsider to the networking world, I would have thought that there were heaps of similar devices or some other re-known, wide use alternate solution that I was not able to find, like some sort of managed switch that could be easily commanded on http to bring ports up or down.

From what I understand Mikrotik routers seem like not having a command line (SwOS), and maybe (but I have to study more) using a router (RouterOS) as a switch is possible, still I have no idea if it is possible through some "magic" script/setup to obtain what I need/want.

The problem I have does not seem to me so much niche, I wonder if other people in similar condition have found a better solution.

The three gateways with same IP could be - as said - marginal, in the sense that even if I change them to different IP's (once) the result seems to me not as robust/failproof as I would like it to be.

Another network device that seemingly does not exist is a very simple router (actually more like a network address translator) a (hypothetical) device with two ports that simply translates/routes a given address to another one as transparently as possible.

Another semi-random question is could a PoE enabled switch/router actually command the PoE supply to one port?
It seems like it is possible:
https://wiki.mikrotik.com/wiki/Manual:PoE-Out
so, could the POE be switched on and off and connect to a given port a relay of some kind?

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Fri Oct 06, 2023 10:03 am
by Filo
Hi,
As an outsider to the networking world, I would have thought that there were heaps of similar devices or some other re-known, wide use alternate solution that I was not able to find, like some sort of managed switch that could be easily commanded on http to bring ports up or down.
Well, indeed you CAN with Mikrotik. There are also tutorials integrating MT-Devices into SLACK or TELEGRAM and you may be able to command them from there (regardless the security aspect).
From what I understand Mikrotik routers seem like not having a command line (SwOS), and maybe (but I have to study more) using a router (RouterOS) as a switch is possible, still I have no idea if it is possible through some "magic" script/setup to obtain what I need/want.
MikroTik-Devices have a great CLI - indeed it is very easy to adapt. Scripting is no problem, too.
The problem I have does not seem to me so much niche, I wonder if other people in similar condition have found a better solution.

The three gateways with same IP could be - as said - marginal, in the sense that even if I change them to different IP's (once) the result seems to me not as robust/failproof as I would like it to be.
Your main problems seem:

a) Fixed IPs and Gateway on clients
-> Can be fixed by using the MikroTik as 192.168.1.1 (Gateway)

b) Fixed and SAME internal IPs on all Gateways
-> Needs to be fixed to three different internal IPs
-> If you change the internal IPs of the three gateways, any WAN-Failover-Tutorial will fit for you.

Only thing will be: Single Point of Failure on the MikroTik (which of course can be mitigated by creating a "Virtual Router" on more than one device)

I'm not aware of any use-case MikroTik-Devices could not help with - it's a matter of energy invested in learning and adapting.

Regards,
Martin

Re: Attempting to evolve from caveman's failover

Posted: Fri Oct 06, 2023 2:01 pm
by jaclaz
Yes, I know what the problems are, and of course I know how forcibly removing them there won't be any more those problem (but new ones may arise).

Still this is not "problem solving" it is "working around", and - while there is nothing wrong in working around as opposed to solving - if the end result of the workaround is not satisfying, besides removing all the fun, there is also no real advantage.

I see it more like a game, there are Rules on how to play and you must play along those rules, you cannot invent your own rules on the spot and call it a day (unless it is Calvinball, which is actually fun:

https://calvinandhobbes.fandom.com/wiki/Calvinball

I will search/study/think a bit more, as I think there can be practical and working solutions without spending a fortune in professional/industrial devices.

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Fri Oct 06, 2023 4:10 pm
by Filo
Well, maybe another user likes to join this topic, but I think, you can extract some ideas from this already. We discussed several ways:

- Change the config of everything surrounding a central MikroTik device (change routers‘ IPs)

- Adapt the given facts and build workarounds on MikroTik (like scripts in Netwatch and Ports enabling and disabling automatically)

Both ways are open for you with MikroTik, even to buy two or more of them to mitigate SPOF and even scripts and automation are open for everything.

Whatever you choose - good luck.

Regards,
Martin

Re: Attempting to evolve from caveman's failover

Posted: Sat Oct 07, 2023 12:04 pm
by jaclaz

Whatever you choose - good luck.
Thank you very much.

In the meantime I checked around about (even if I won't probably in the end use them) ethernet relays and similar stuff, it seems like there are an endless amount of "hobby" devices, "no name" and of dubious working, and - on the other end of the spectrum - professional PDU's intended for racks with the usual (IMHO) crazy prices.
I found a few (maybe) good items in the lower price range (only for memory and for future reference:
https://www.kmtronic.com/LAN-Relay-Cont ... duct_id=95
https://www.waveshare.com/modbus-poe-eth-relay.htm
https://tinycontrol.pl/en/lan-controller-35/
and seemingly a good, documented one (it also has an online simulator) which stands out, in the higher price range:
https://www.netio-products.com/en/products/all-products
the rack PDU is also programmable/scriptable with LUA,
and a one-of-a-kind device, that could be useful in simpler projects/setups:
https://www.tyconsystems.com/tpdin-poe-relay

Finally a more complete range of DIN Rail devices:
https://relaydroid.com/

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 09, 2023 10:55 am
by jaclaz
I expected a rabbit hole, but frankly not as deep as it seems to be (about failover methods), I checked several (many, likely a couple dozens) posts/blogs/tutorials about failover, many are incomplete or posted with a later comment by someone correcting them in this or that "wrong" parts, I would have expected to be a few "canonical" ways that by this time were "established".
Besides the "help official":
https://help.mikrotik.com/docs/pages/vi ... d=26476608
and your (Fllo's) very nice and simple one :) :
viewtopic.php?t=198999
I found also the one by Chupaka:
viewtopic.php?t=157048
on that thread there is a reference to a nice presentation by Tomas Kirnak:
https://mum.mikrotik.com/presentations/US12/tomas.pdf
that, even if definitely "advanced" clears a lot of doubts about the terminology/methods that I had.
Right when I was convinced that - even if complex - the recursive check was the way to go, I found that there is another way through "Detect Internet":
https://wiki.mikrotik.com/wiki/Manual:Detect_internet
https://help.mikrotik.com/docs/display/ ... t+Internet
that seemingly is not much used, I found a seemingly valid example/explanation here:
viewtopic.php?t=159396
but not much more.

It will be a looong (besides steep) learning path.

Only as a side note, I found (on an Italian board dedicated to RouterOS) a (small) confirmation of my original idea/approach:
https://www.routerositalia.net/forum/vi ... f=1&t=3957
in the post (unfortunately not answered to/abandoned) the user asks about using a Mikrotik to replace a Draytek working with 192.168.0.1 on a LAN port and 192.168.0.2 on TWO WAN ports, so that the load balancing/failover router can effectively be bypassed in case of troubles.
It is good to know that I am not the only crazy guy with the idea of multiple same address gateways.

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 09, 2023 2:12 pm
by llamajaja
Start at para I..... feel your pain. viewtopic.php?t=182373

Re: Attempting to evolve from caveman's failover

Posted: Wed Oct 11, 2023 4:31 pm
by jaclaz
Start at para I..... feel your pain. viewtopic.php?t=182373
Yep, I gave read (and re-read, and re-re-read) that paragraph, but it still sounds to me (with all due respect to the Author, anav. whom surely posted it in good faith and as an attempt to help fellow board members) largely similar to Vogon's Poetry.

Some of the examples are convoluted/overcomplicated and a lot of info is missing, the "DTRIPLE WAN - RECURSIVE" (which could have been a solution/answer to my question) has been (IMHO) overcomplicated by introducing besides the three WANs also three (actually six) subnets and after posting a set of 9 (nine) additional routes with target-scope=14 ends with:
Then the rest of the routes are required, six with target scope of 13, and the last six with target scope of 12.
which makes little sense (to me), if there are nine routes with target-scope=14, there should be also nine with target-scope=13 and nine with target-scope=12, shouldn't they?

In the DUAL WAN - RECURSIVE, it is not defined why some (explicit) IP address are chosen, and what elsewhere is called "virtual hop" becomes (AFAICU) "Bogus address".

It is very confusing when different terminology is used by everyone that writes these tutorials (and often different from what is called in Mikrotik official wiki/documentation).

Very likely this is part of the difficulties that a non-native speaker has when learning a new language, when I started speaking (almost) English I called my home a "house", and was promptly corrected with a "You mean an apartment, don't you?" so, next time I used "apartment", and was promptly corrected with a "You mean a flat?" :wink:

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Fri Oct 13, 2023 10:55 am
by jaclaz
After much looking around I found two videos (from Indonesia by Citraweb, luckily with English captions) that do explain nicely two ways to manage on a same router two same IP gateways, one addressing the ether interface with % (like "gateway=192.168.1.1%ether1") and one making use of VRF (actually I believe what is called "VRF lite" in some other examples/tutorials.

The context is slightly different (it is focused on load balancing) but the concepts are (IMHO) well explained.

Video #1:
https://www.youtube.com/watch?v=ZiybVYms6kw

Video #2:
https://www.youtube.com/watch?v=dWwKIP2Kqbo

Besides the actual usefulness of the content, it is good to know that I am not (yet) completely crazy and other people have to deal with multiple ISP's with "fixed" same gateway IP.

Usually I hate videos (when compared to articles or blog or forum posts) but these ones are clear/slow enough.

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 16, 2023 12:50 pm
by jaclaz
Here is a graphical representation of my current situation:

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 16, 2023 12:53 pm
by jaclaz
I got my hands on some (old, slow) cheap routers, only capable of routing a LAN address to a WAN one, TP-Link TL-R460, here is a graphical representation of a possible setup.

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 16, 2023 12:57 pm
by jaclaz
The next evolution, replacing the #4 router with a RB750GR3 (thus being capable of automatic failover):

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 16, 2023 1:01 pm
by jaclaz
Or, I could replace three of the Tp-Link's with a MT RB750GR3:

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 16, 2023 1:02 pm
by jaclaz
Then I could use two RB750GR3's as follows.

The question is:
is there a way (some magic or protocol or whatever) that would allow me to do all this in a "same" single router? (even if that would mean using a "better" router, with more ports or some other advanced characteristics)?

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 16, 2023 6:18 pm
by mtest001
Hello,
I have the same need as you, i.e. 2 ISPs whose routers have fixed IPs (192.168.1.1). I have created two VRFs and I am able to route correctly on one or the other, but I did not manage to have the NAT working.

Do you mind sharing your Mikrotik router configuration ?

Thank you.

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 16, 2023 7:06 pm
by jaclaz
Do you mind sharing your Mikrotik router configuration ?
I haven't any, I am still studying if the whole thing is doable and - if it is - if it is worth the hassle.

I have read several VRF related threads, read (and re-read) the official Mikrotik wiki/docs, watched the two or three videos that are usually linked to on the forum and came out with very little understanding.

The only source I could (maybe) understand was this video:
https://www.youtube.com/watch?v=dWwKIP2Kqbo
(Indonesian but with English subtitles)

I believe the missing "magic" or the "trick" in your situation:
viewtopic.php?p=1030324
lies in the "routing mark", the procedure is explained in the linked video.

From the little I am understanding of RouterOS, the same thing can probably be made through 2 or 3 different ways, so maybe you are attempting to use a different method from the one in the video and consider that I am a complete beginner so take my advise with lots of salt, but maybe you can replicate what they show.

jaclaz

Re: Attempting to evolve from caveman's failover

Posted: Mon Oct 16, 2023 9:34 pm
by mtest001
Ok sorry I misunderstood, I thought you had done it already.

On my side I am confident that I can make it work, I'm almost there, all that's missing for now is the NAT between the main network and the VRF.

Re: Attempting to evolve from caveman's failover

Posted: Sat Oct 28, 2023 2:20 pm
by jaclaz
Bump!

I cannot say if what I asked is too simple (or too stupid, or both) or if I am failing somehow to expose the questions/doubts properly.
Then I could use two RB750GR3's as follows ...

The question is:
is there a way (some magic or protocol or whatever) that would allow me to do all this in a "same" single router? (even if that would mean using a "better" router, with more ports or some other advanced characteristics)?
Thinking aloud, if I take a Mikrotik Router with more than 5 interfaces, let's say a RB2011, is there anything preventing me from creating an external loopback between two interfaces (With a short RJ45 cat5e cable)?

Something loosely *like* the attached scheme:
Schemi_rete.pdf
jaclaz
:

Re: Attempting to evolve from caveman's failover

Posted: Thu Nov 23, 2023 11:58 am
by jaclaz
Small update.
I had some time to test a basic layout similar to the one in post #15 (leaving aside the ISP3/router TP-link #3) and it (sorts of) works.

The issue is that when I "switch" between router TP-Link #1 and #2 (powering off #1 and turning on #2) Router TP-Link #4 doesn't "sense" the change (very likely something remains "sticky") and I have to switch on/off (reboot) the Router #4 to let it detect the newly powered on router.

Probably because these routers are very simple, they are very fast to boot, something like 4-5 seconds.

The settings on Routers #1 and #2 are:
LAN:10.0.0.1
Subnet:255.255.255.0
WAN:
IP:192.168.1.254
Subnet:255.255.255.0
Gateway: 192.168.1.1

The settings on Router #4 are:
LAN:
IP 192.168.1.1
Subnet: 255.255.255.0
WAN:
IP:10.0.0.254
Subnet: 255.255.255.0
Gateway:10.0.0.1

Nothing else.

Very likely (though all other settings are disabled) it performs a sort of automatic NAT (or sort of netmap) between LAN and WAN or maybe it uses some kind of Proxy-ARP?

The latter is a possibility, since the destination router (#1 or #2) remains "sticky", but I really don't know.

Re: Attempting to evolve from caveman's failover

Posted: Thu Nov 23, 2023 2:20 pm
by templlama
manual is probably the least complex path.
Another option is buying three hex routers.
hex1 - LAN output 192.168.11 gateway 192.168.1
hex2 - LAN output 192.168.12 gateway 192.168.1
hex3 - LAN output 192.168.13 gateway 192.168.1

RB5009 or another HEX as the glue.
Ether1 WAN is 192.168.11.2
Ether2 WAN is 192.168.12.2
Ether3 WAN is 192.168.13.2

FInal Router LAN is 192.168.1.1 gateway=192.168.1.1
Source nat ensures all LAN traffic coming from each WAN towards the first line of Hexes, has source IP 192.168.11.2 / 12.2 and 13.2.

Maybe I am missing something but thats what I thought of first.
In terms of failover, simple is:
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=192.168.11.1 routing-table=main
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=192.168.12.1 routing-table=main
add check-gateway=ping distance=4 dst-address=0.0.0.0/0 gateway=192.168.13.1 routing-table=main


+++++++++++++++++++++++++++++++++++
Trying to do it all on one router maybe a stretch.
The routes can be all described by
add check-gateway=ping distance=2 dst-address=0.0.0/0 gateway=192.168.1.1%ether1 routing-table=main
add check-gateway=ping distance=3 dst-address=0.0.0/0 gateway=192.168.1.1%ether2 routing-table=main
add check-gateway=ping distance=4 dst-address=0.0.0/0 gateway=192.168.1.1%ether3 routing-table=main

However I have no clue how to deal with the fact the LAN nomenclature is the same as the WAN nomenclature.
Someone with better RoS knowledge and networking skills should be able to figure this out.

Re: Attempting to evolve from caveman's failover

Posted: Thu Nov 23, 2023 10:56 pm
by jaclaz
Thank you, but I am already (at least in theory) on only two routers (let's say two RB750gr3) the first to manage the three same gateways using either Vrf's or Proxy Arp, the second to simply "translate" the 192.168.1.0/24 internal LAN to an "intermediate" network.
The configuration and layout or the first router have been just posted on the related thread mtest001 made:
viewtopic.php?t=200602

Re: Attempting to evolve from caveman's failover

Posted: Fri Nov 24, 2023 6:54 pm
by jaclaz
Ok, so here is a possible solution making use of two routers, working, at least in GNS3.
There isn't yet any "advanced" failover function implemented.

And the base question still remains, is it possible to do the same with only one router (using *somehow* internally the netmap function that is now in the second router or using some of the other Mikrotik magic tiricks)?

Configuration of Router0_VRF:
[admin@Router0_VRF] > /export
# 2023-11-24 16:43:46 by RouterOS 7.11.2
# software id =
#
/interface ethernet
set [ find default-name=ether1 ] disable-running-check=no
set [ find default-name=ether2 ] disable-running-check=no
set [ find default-name=ether3 ] disable-running-check=no
set [ find default-name=ether4 ] disable-running-check=no
set [ find default-name=ether5 ] disable-running-check=no
set [ find default-name=ether6 ] disable-running-check=no
set [ find default-name=ether7 ] disable-running-check=no
set [ find default-name=ether8 ] disable-running-check=no
/disk
set slot1 slot=slot1 type=hardware
set slot2 slot=slot2 type=hardware
set slot3 slot=slot3 type=hardware
/interface list
add comment=defconf name=WAN
add comment=defconf name=LAN
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip vrf
add interfaces=ether1 name=vrf1
add interfaces=ether2 name=vrf2
/port
set 0 name=serial0
/interface list member
add interface=ether1 list=WAN
add interface=ether2 list=WAN
add interface=ether3 list=WAN
add interface=ether4 list=LAN
add interface=ether5 list=LAN
/ip address
add address=192.168.1.254/24 interface=ether1 network=192.168.1.0
add address=192.168.1.254/24 interface=ether2 network=192.168.1.0
add address=10.1.1.1/30 interface=ether4 network=10.1.1.0
/ip firewall nat
add action=src-nat chain=srcnat src-address=192.168.2.0/24 to-addresses=\
    192.168.1.254
/ip route
add distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf1 routing-table=\
    main
add distance=2 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf2 routing-table=\
    main
add dst-address=192.168.2.0/24 gateway=10.1.1.2 routing-table=vrf1
add dst-address=192.168.2.0/24 gateway=10.1.1.2 routing-table=vrf2
/system identity
set name=Router0_VRF
/system note
set show-at-login=no
Configuration of Inter_Router_1:
[admin@Inter_Router_1] > /export
# 2023-11-24 16:30:28 by RouterOS 7.11.2
# software id =
#
/interface bridge
add name=bridge1
/interface ethernet
set [ find default-name=ether1 ] disable-running-check=no
set [ find default-name=ether2 ] disable-running-check=no
set [ find default-name=ether3 ] disable-running-check=no
set [ find default-name=ether4 ] disable-running-check=no
set [ find default-name=ether5 ] disable-running-check=no
set [ find default-name=ether6 ] disable-running-check=no
set [ find default-name=ether7 ] disable-running-check=no
set [ find default-name=ether8 ] disable-running-check=no
/disk
set slot1 slot=slot1 type=hardware
set slot2 slot=slot2 type=hardware
set slot3 slot=slot3 type=hardware
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/port
set 0 name=serial0
/ip address
add address=10.1.1.2/30 interface=ether1 network=10.1.1.0
add address=192.168.1.1/24 interface=ether2 network=192.168.1.0
/ip firewall nat
add action=netmap chain=srcnat src-address=192.168.1.0/24 to-addresses=\
    192.168.2.0/24
add action=netmap chain=dstnat dst-address=192.168.2.0/24 to-addresses=\
    192.168.1.0/24
/ip route
add gateway=10.1.1.1
/system identity
set name=Inter_Router_1
/system note
set show-at-login=no

Re: Attempting to evolve from caveman's failover

Posted: Fri Dec 08, 2023 5:30 pm
by jaclaz
So, a simpler setup with two routers seems to be (no netmap in the intermediate router):
Router0_VRF:
#reproducible from blank CHR

/ip vrf
add interfaces=ether2 name=vrf2
add interfaces=ether1 name=vrf1

/ip address
add address=192.168.1.254 interface=ether1 network=192.168.1.1
add address=192.168.1.254 interface=ether2 network=192.168.1.1
add address=10.0.0.1/30 interface=ether4 network=10.0.0.0


/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=\
    192.168.1.254
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=\
    192.168.1.254

/ip route
add distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf1 routing-table=\
    main
add distance=2 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf2 routing-table=\
    main
add dst-address=192.168.1.0/24 gateway=10.0.0.2 routing-table=\
    vrf2
add dst-address=10.0.0.0/30 gateway=10.0.0.2 routing-table=vrf1
add dst-address=10.0.0.0/30 gateway=10.0.0.2 routing-table=vrf2
add dst-address=192.168.1.0/24 gateway=10.0.0.2 routing-table=\
    vrf1
add dst-address=192.168.1.0/24 gateway=10.0.0.2 routing-table=main

/system identity
set name=Router0_VRF
Router 1:
#reproducible from blank CHR


/ip address
add address=10.0.0.2/30 interface=ether1 network=10.0.0.0
add address=192.168.1.1/24 interface=ether2 network=192.168.1.0


/ip route
add gateway=10.0.0.1

/system identity
set name=Router1

Re: Attempting to evolve from caveman's failover

Posted: Fri Dec 08, 2023 5:40 pm
by jaclaz
And - finally - this should be a possible solution with only one router. :!:

VRF_LAN:
#reproducible from blank CHR

/ip vrf
add interfaces=ether2 name=vrf2
add interfaces=ether1 name=vrf1

/ip address
add address=192.168.1.254 interface=ether1 network=192.168.1.1
add address=192.168.1.254 interface=ether2 network=192.168.1.1
add address=192.168.1.1/24 interface=ether8 network=192.168.1.0

/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=\
    192.168.1.254
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=\
    192.168.1.254

/ip route
add distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf1 routing-table=\
    main
add distance=2 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf2 routing-table=\
    main
add dst-address=192.168.1.0/24 gateway=ether8 routing-table=vrf1
add dst-address=192.168.1.0/24 gateway=ether8 routing-table=vrf2

/system identity
set name=VRF_LAN
It is fairly simple (I made a few other tests with different setups but they were all more complex, using a third VRF to emulate the previous two-routers solution or using some scripts to enable/disable interfaces and routes).

Still I don't know exactly how/why it works (and maybe it doesn't actually work on real hardware, only in GNS3), very likely it is some *magic* related to the /32 connection to the ISP router(s) AND to the "isolation" (or whatever) of the VRF's that combined together allow the "back" route to ether8 and to the LAN.

Re: Attempting to evolve from caveman's failover

Posted: Fri Dec 08, 2023 6:36 pm
by mtest001
:D no magic here, it works exactly as intended ...

Maybe if you can be a bit more specific on what part is still not clear in you mind I can try to explain.

Re: Attempting to evolve from caveman's failover

Posted: Fri Dec 08, 2023 7:58 pm
by jaclaz
:D no magic here, it works exactly as intended ...

Maybe if you can be a bit more specific on what part is still not clear in you mind I can try to explain.
Well, a good half (not really, I am joking) of posts on the board revolve around routers being designed to route and thus *needing* different networks or subnets on the LAN and WAN sides of the router.

Though (for whatever reasons) I have been exempted by the most harsh "you should change the addresses on your LAN (or WAN) side" suggestions, i have seen so many of them directed to other beginner users that I doubted it was possible at all, as I posted earlier, the only (partial) reference to a router that is "transparent" (in the sense that can be manually by-passed in case of failure) was about a particular setup involving a Draytek router on an Italian forum.

In my (perverted) mind the usefulness of this kind of setup (introducing a failover solution that is removable in case of need by simply moving a few cables) is so obvious that I believed it was a common and established approach.

The "final" solution (provided it works in the real word, outside GNS3) is actually pretty much simple while I expected it to be incredibly complex, given the number of posts suggesting to redesign the network as first step.

Surely the whole VRF stuff is mis-documented (if documented at all) and besides vague references to it here and there there are no (or I was unable to find them) clear explanations on how/why they work.

This particular kind of VRF (that you proposed) has essentially a "in-interface" outside the VRF and the "out-interface" inside it[1], but during my semi-random experiments (replacing the intermediate router in the two router solution with a third VRF on main router) I managed also to create a VRF with both in and out interfaces inside it (though it produced some quirks in tracing to 8.8.8.8 ). It is another thing I still need to experiment with,

I also tested an even crazier setup, using the CHR as a switch with two connections to each ISP modem/router (two cables/interfaces used for each modem), one to monitor the connection and one to be enabled/disabled in case of need.
This latter worked but - not unlike the actual physical test I made with the old, cheap TP-Link routers) - suffered from some stickiness (probably due to arp cache) when interfaces were enabled/disabled. Very likely it represents an alternative solution if I can manage *somehow* to manage duplicate MAC addresses (or whatever), but this is an exercise in futility that I will keep for later.


[1] which is I believe what makes the need of default gateway (0.0.0.0/0) to be put in "main" table BUT pointing to an IP on @vrf1 or @vrf2 AND the "return" route to be inside the vrf1 or vrf2 routing-tables BUT pointing to ether8 in "main". What I suspect (though I'll have to make a couple experiments) is that one of the keys to have the setup working is the use of a /32 address to connect to the ISP modem(s)/router(s) that allows the "return" route.

Re: Attempting to evolve from caveman's failover

Posted: Mon Dec 11, 2023 6:23 pm
by mtest001
I think the key lesson here is what you said: not that many people know about VRFs and how to use it, hence the many responses asking you to change the IP addresses.

VRFs allow precisely what you are looking for: using the same IP ranges on different interfaces and be able to route the traffic. This page here explains it well: https://avinetworks.com/glossary/virtua ... rding-vrf/

Note that strictly speaking nothing forbids from using the IPs on different interfaces even without VRFs for as long as they are not in the same broadcast domain (otherwise you would have a problem of conflicting IPs). But the problem is: how to route packets between the interfaces if they share the same subnets? And that's were the VRFs are helping. There are other possibilities as I demonstrated with my solution based on proxy-arp.

The /32 address creates a point-to-point link, I don't think you need it in this scenario, but why not.

Re: Attempting to evolve from caveman's failover

Posted: Tue Dec 12, 2023 1:00 pm
by jaclaz
I think the key lesson here is what you said: not that many people know about VRFs and how to use it, hence the many responses asking you to change the IP addresses.
I think that the real issue is that (this is not only on this forum, it is common enough) there is a lack of understandable (for the newbies[1]) explanations about the way things work (or fail to work), and - even when they exist - they are in crumbles here and there on several unconnected threads, so that putting them together is extremely difficult.

The official help/wiki is among the worst documentation I have ever seen[2] , the forum is essentially on the shoulder of a handful of (knowledgeable/expert) volunteers that - while providing lots of good information - tend to be either very cryptic in their explanations (if any) or manage to spread the info all over the board, with topics that slowly (but surely) go astray.

Only as an example, I was looking for a failover method (it seems like as said before there are 2, 3 or more different possible methods to do anything in Mikrotik) and found the (supposedly complete/reproducible/tutoring) one by Chupaka here:
viewtopic.php?t=157048
that starts with a configuration needing either mangle or routing rules to assign routing marks (while completely failing to provide the actual rules) then - slowly - (but seemingly without an actual explanation) it turns into a completely different method with no routing marks (thus easier) but that is not explained, and this is brought forward through (seemingly partial) snippets of configuration, then at a certain point there is this:
viewtopic.php?t=157048#p941360
which brings us to this example by rextended:
viewtopic.php?p=963933#p963933
(which BTW is on a thread titled "WAN Load Balancing between 2 ISPs - one with CGNAT and another in bridge mode (real IPV4 address) ", in practice no way to find it with common searching)
that looks nice and simple but that I cannot understand at all how it works (if it works).
On the thread there are a couple similar ones by anav (as well without a real explanation and some are also in a dubitative form.

So I will have to reproduce those setups in GNS3, test if the whole stuff is actually working, see if I can understand how it works, then try to adapt it to the VRF solution.

The method Filo posted a link to earlier on this thread:
viewtopic.php?t=198999
is seemingly simpler AND it is explained (but on the other hand needs to be translated back to actual commands) though I am struggling to understand how it can be extended from dual to triple failover.

Doing "random" tests in order to be able to use the latter (Filo's) method in a triple failover, I tried using (again) two routers in cascade, each doing dual failover (router0 with ISP1 connected to ether1/vrf1 and ISP2 connected to ether2/vrf2 and BEFORE it a router1 with router0 connected to ether1/vrf1 and "last resort" ISP3/LTE connected to ether2/vrf2).
I was surprised (not really, Murphy's Laws are a thing) that it didn't work as expected, in the sense that what I thought was a "transparent" router seems to be not-so-transparent and I had to use a different subnet between the routers to have the setup (almost) working.
As soon as I wiil have time to do more tests I will post this (yet another) configuration in a reproducible form.

The whole stuff is a real PITA, but on the other hand it is actually fun :) , so it is not wasted time, it is a (still in my perverted mind) a form of entertainment.






[1] but then if everyone was an expert in the matter and knew everything the forum would probably have no reason to exist
[2] typically a page on the wiki lists 25 possible commands/options, half of which self-referencing, then provides 1 or 2 (incomplete) examples making use of just one or two of the 25 listed commands/options

Re: Attempting to evolve from caveman's failover

Posted: Tue Dec 12, 2023 3:25 pm
by mtest001
Now that you have the VRFs configured the simple failover is quite easy to implement, I'll explain what I did and you can customize to your own needs.

I have two routes to 0.0.0.0/0 in the main routing table leading to each one of my ISPs, one with higher preference (i.e. lower distance) than the other:
/ip route
add distance=2 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf-starlink routing-table=main comment="RouteStarlink"
add distance=3 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf-orange routing-table=main comment="RouteOrange"
So with this in a normal situation the route with the lowest distance, through the vrf-starlink, will be preferred. The trick consists in changing the distance of the other route to 1 when Starlink is down so that it becomes the preferred route due to the lower distance (because 1 < 2).

To detect if Starlink is down I have a /32 route to a gateway inside the Starlink infra, which I force through the Starlink VRF, which I will be checking with NetWatch. If the ping fails (timeout), the priority of my secondary route will be brought from 3 down to 1 so that it becomes the preferred route.
/ip route
add dst-address=100.64.0.1/32 routing-table=main gateway=192.168.1.1@vrf-starlink
/tool/netwatch
add host=100.64.0.1 interval=10 timeout=5 up-script=":log error \"starlink is up\";/ip route set distance=3 [find comment=\"RouteOrange\"];:log error \"route orange deprioritized\";" down-script=":log error \"starlink down\";/ip route set distance=1 [find comment=\"RouteOrange\"];:log error \"route orange prioritized\""
So the distance of my least preferred route will change from 3 to 1 depending on the availability of my most preferred ISP, which will keep its distance of 2.

You can follow the same logic to switch between 3 ISPs if you are able to rank them "most preferred", "second preferred" and "least preferred". The most preferred could for example have a distance of 1 in normal condition, and 4 if down, the second most preferred could have 2 if up and 5 if down, and the third preferred could have a distance of 3.

Of course we are not talking about load balancing here, just fail-over.

Also the fail-over process is slow, typically from a client perspective it takes between 15 seconds and 1 minute depending on the application.

Re: Attempting to evolve from caveman's failover

Posted: Tue Dec 12, 2023 7:09 pm
by jaclaz
Thanks.
Your approach seems simple enough (similar to the one by Filo) making use of distance, but there is no need of an added routing table nor of mangle mark routing, I like it.

With three connections should be (very loosely) something *like* this:
/ip route
#following route is the main one and stays fixed to distance 20
add distance=20 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf1 routing-table=main comment="ISP_1"
#following route is second best and flips between 10 (when used) and 30 (in normal operation)
add distance=30 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf2 routing-table=main comment="ISP_2"
#following route is the least desirable (LTE in my case) and flips between 15 (when used) and 40 (in normal operation)
add distance=40 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf3 routing-table=main comment="ISP_3"
#following route is to test connectivity of ISP_1 using 1.1.1.1 as gateway
add dst-address=1.1.1.1/32 routing-table=main gateway=192.168.1.1@vrf1
#following route is to test connectivity of ISP_2 using 4.4.2.1 as gateway
add dst-address=4.4.2.1/32 routing-table=main gateway=192.168.1.1@vrf2
#following route is to test connectivity of ISP_3 using 4.4.2.2 as gateway
add dst-address=4.4.2.2/32 routing-table=main gateway=192.168.1.1@vrf3
Now, when it comes to the netwatch scripts things become a little murkier.

I am posting what I think might work but I won't be able to test and report/correct for a couple of days.
(only trying to separate the up and down scripts from netwatch entry as it seems "cleaner" and easier to read
/tool/netwatch
add host=1.1.1.1 interval=10 timeout=5 up-script=Use_ISP_1 down-script=Use_ISP_2
add host=4.4.2.1 interval=10 timeout=5 up-script=Use_ISP_2 down-script=Use_ISP_3
add host=4.4.2.2 interval=10 timeout=5 up-script=Use_ISP_3 down-script=Total_Fail
and scripts should be something *like* (without comments/log info etc. to make them more readable):
/system script
# this demotes ISP_2 and ISP_3 to their normal 30 and 40 distance
 add name=Use_ISP_1 source={/ip route set distance=30 [ find comment=\"ISP_2\" ]; /ip route set distance=40 [ find comment=\"ISP_3\" ])
 #this promotes ISP_2 to distance 10 and demotes ISP_3 to its normal 40 (if needed)
 add name=Use_ISP_2 source={/ip route set distance=10 [ find comment=\"ISP_2\" ]; /ip route set distance=40 [ find comment=\"ISP_3\" ])
 #this promotes ISP_3 to distance 15 and demotes ISP_2 to its normal 30 distance 
 add name=Use_ISP_3 source={/ip route set distance=15 [ find comment=\"ISP_3\" ] ; /ip route set distance=30 [ find comment=\"ISP_2\" ] )
 add name=Total_Fail source={<I will put here such things ... what they are, yet  I know not, but they shall be the terrors of the network> )
 
The above cannot relly work "as is", a check of some kind needs to be added, otherwise when in "normal" operation (with ISP_1 up) a failure of ISP_2 will trigger the Use_ISP_3 and either Use_ISP_1 will conflict (attempting to flip ISP_3 back to distance 40, that then will be re-upped to 15 by Use_ISP2) or it will do nothing and the connection will remain on the ISP_3 until something else happens.

If I get right the documentation the up-script is only executed when connection status changes from down to up and the down-script only when the connection status changes from up to down.
So if the connection on ISP_1 remains up, the up-script Use_ISP_1 is never executed.

It shouldn't be a problem, just a matter of learning a bit more of Mikrotik script language in order to make the promotions/demotions conditional on the current distance situation of other interfaces.

Re: Attempting to evolve from caveman's failover

Posted: Thu Dec 14, 2023 7:19 pm
by jaclaz
Ok, so I didn't manage (yet) to find the time to study the scripting language and learn enough to be confident in writing the triple WAN failover script, but I managed to find what was the issue with my concept of "transparent" router (with two vrf's).

A (stupid) error using more than one router with same "outbound" addresses . :shock:

With non-duplicate addresses the concept works. :)

As (possibly an exercise in futility) I made a POC in GNS3/CHR which seems completely crazy until you realize that all the VRF_xxx routers have the same configuration (bar the "outbound" interfaces addresses) and are as a matter of fact interchangeable so that you can have a spare one or two with exactly the same configuration (but with a different IP on the "outbound" interfaces) that can be replaced by anyone by simply disconnecting and re-connecting the cables.

The configuration uses a netwatch 2 WAN failover approach derived from mtest001's nice one.

Of course the second VRF and the netwatch scripts in VRF_252, VRF_251 and VRF_250 are unneeded.

Here is the reproducible configuration for VRF_253:
#reproducible from blank CHR
# this is for VRF_253 with "outbound" interfaces set to 192.168.1.253
# in Notepad you can use Find and replace with 253 and the IP you want 254, 252, etc.
#
# the 253 is in 5 places:
# 2 times in /ip address, for ether1 and ether2
# 2 times in /ip firewall nat, for ether1 and ether2
# in /system identity name

/ip vrf
add interfaces=ether1 name=vrf1
add interfaces=ether2 name=vrf2

/ip address
add address=192.168.1.1/24 interface=ether8 network=192.168.1.0
add address=192.168.1.253 interface=ether1 network=192.168.1.1
add address=192.168.1.253 interface=ether2 network=192.168.1.1

/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=\
    192.168.1.253
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=\
    192.168.1.253

/ip route
add comment=Primary_ISP distance=2 dst-address=0.0.0.0/0 gateway=\
    192.168.1.1@vrf1 routing-table=main
add comment=Next_ISP distance=3 dst-address=0.0.0.0/0 gateway=\
    192.168.1.1@vrf2 routing-table=main
add dst-address=192.168.1.0/24 gateway=ether8 routing-table=vrf1
add dst-address=192.168.1.0/24 gateway=ether8 routing-table=vrf2
add dst-address=1.1.1.1/32 gateway=192.168.1.1@vrf1 routing-table=main

/system identity
set name=VRF_253

/system script
add dont-require-permissions=yes name=Use_Primary owner=admin policy=\
    read,write,test source=":log error \"Use_Primary says Primary_ISP is up\";\
    /ip route set distance=3 [ find comment=\"Next_ISP\" ]"
add dont-require-permissions=yes name=Use_Next owner=admin policy=\
    read,write,test source=":log error \"Use_next says Primary_ISP is down\";/\
    ip route set distance=1 [ find comment=\"Next_ISP\" ]"
/tool netwatch
add down-script=Use_Next host=1.1.1.1 interval=10s timeout=5s type=simple \
    up-script=Use_Primary

Tracing from the "As_device" router has three or four hops of 192.168.1.1:
[admin@MikroTik] /ip/address> /tool trace 8.8.8.8
Columns: ADDRESS, LOSS, SENT, LAST, AVG, BEST, WORST, STD-DEV
 #  ADDRESS          LOSS  SENT  LAST    AVG   BEST  WORST  STD-DEV
 1  192.168.1.1      0%       4  1.8ms   2.5   1.8   4.3    1.1
 2  192.168.1.1      0%       4  2.3ms   2.3   2     2.5    0.2
 3  192.168.1.1      0%       4  4.7ms   5     4.5   5.8    0.5
 4  88.48.86.232     0%       4  11.5ms  11    10.1  11.8   0.7
 5  172.17.107.136   0%       4  11.9ms  10.7  10    11.9   0.7
 6  172.19.184.70    0%       4  14.7ms  14    13.2  14.7   0.6
 7  172.19.177.62    0%       4  15.3ms  14.7  14.5  15.3   0.3
 8  195.22.205.116   0%       4  15ms    14.5  13.6  15     0.6
 9  142.250.168.148  0%       4  14.9ms  14.5  14    14.9   0.3
10  72.14.239.144    0%       4  15.3ms  22.5  15.3  42.4   11.5
11  142.251.235.179  0%       4  14.9ms  23.1  14.9  46.9   13.7
12  8.8.8.8          0%       4  14.6ms  21.8  14.6  43.2   12.4
and, with first ISP disabled:
[admin@MikroTik] /ip/address> /tool trace 8.8.8.8
Columns: ADDRESS, LOSS, SENT, LAST, AVG, BEST, WORST, STD-DEV
 #  ADDRESS          LOSS  SENT  LAST    AVG   BEST  WORST  STD-DEV
 1  192.168.1.1      0%       8  1.6ms   2     1.4   3.1    0.5
 2  192.168.1.1      0%       8  2.2ms   2.4   1.8   3.6    0.5
 3  192.168.1.1      0%       8  3.1ms   7.5   2.7   39     11.9
 4  192.168.1.1      0%       8  5.4ms   10.8  4.5   49.9   14.8
 5  88.48.86.232     0%       8  11.9ms  11.6  10.4  14.9   1.4
 6  172.17.107.136   0%       8  11.8ms  11.5  10.8  12.4   0.5
 7  172.19.184.70    0%       8  14.4ms  14.9  14.3  15.4   0.4
 8  172.19.177.62    0%       8  15.7ms  15.5  15    16.1   0.4
 9  195.22.205.116   0%       8  15.3ms  15.4  14.5  16.6   0.7
10  142.250.168.148  0%       8  15.9ms  15.3  14.4  16     0.5
11  72.14.239.144    0%       8  15.8ms  25.4  15.6  55.7   16.1
12  142.251.235.179  0%       8  15.8ms  22.1  15.3  40.5   10.5
13  8.8.8.8          0%       8  15.6ms  22.2  15.1  43.9   11.6

Re: Attempting to evolve from caveman's failover

Posted: Wed Dec 20, 2023 6:48 pm
by jaclaz
Ok, so this is the reproducible version with 3 WAN failover via VRF.
/ip vrf
add interfaces=ether3 name=vrf3
add interfaces=ether1 name=vrf1
add interfaces=ether2 name=vrf2

/ip address
add address=192.168.1.254 interface=ether1 network=192.168.1.1
add address=192.168.1.254 interface=ether2 network=192.168.1.1
add address=192.168.1.254 interface=ether3 network=192.168.1.1
add address=192.168.1.1/24 interface=ether8 network=192.168.1.0


/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=\
    192.168.1.254
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=\
    192.168.1.254
add action=src-nat chain=srcnat out-interface=ether3 to-addresses=\
    192.168.1.254

/ip route
add comment=Primary_ISP distance=3 dst-address=0.0.0.0/0 gateway=\
    192.168.1.1@vrf1 routing-table=main
add comment=Next_ISP disabled=no distance=4 dst-address=0.0.0.0/0 gateway=\
    192.168.1.1@vrf2 routing-table=main
add dst-address=192.168.1.0/24 gateway=ether8 routing-table=vrf1
add dst-address=192.168.1.0/24 gateway=ether8 routing-table=vrf2
add dst-address=1.1.1.1/32 gateway=192.168.1.1@vrf1 routing-table=vrf1
add comment=LTE_Last disabled=no distance=5 dst-address=0.0.0.0/0 gateway=\
    192.168.1.1@vrf3 routing-table=main
add dst-address=192.168.1.0/24 gateway=ether8 routing-table=vrf3
add dst-address=4.2.2.1/32 gateway=192.168.1.1@vrf2 routing-table=vrf2
add dst-address=4.2.2.2/32 gateway=192.168.1.1@vrf3 routing-table=vrf3

/system identity
set name=VRF_254



/system script
add dont-require-permissions=yes name=Primary_running owner=admin policy=\
    read,write,test source=":log error \"Primary_running says Primary_ISP is u\
    p\";/ip route set distance=4 [ find comment=\"Next_ISP\" ];/ip route set d\
    istance=5 [ find comment=\"LTE_Last\" ];/tool netwatch disable [ find comm\
    ent=\"Next\" ];/tool netwatch disable [ find comment=\"LTE\" ]"
add dont-require-permissions=yes name=Use_Next owner=admin policy=\
    read,write,test source=":log error \"Use_next says Primary_ISP is down\";/\
    ip route set distance=1 [ find comment=\"Next_ISP\" ];/ip route set distan\
    ce=5 [ find comment=\"LTE_Last\" ];/tool netwatch enable [ find comment=\"\
    Next\" ];/tool netwatch disable [ find comment=\"LTE\" ]"
add dont-require-permissions=yes name=Next_running owner=admin policy=\
    read,write,test source=":log error \"Next_running says Next_ISP is up\";/i\
    p route set distance=1 [ find comment=\"Next_ISP\" ];/ip route set distanc\
    e=5 [ find comment=\"LTE_Last\" ];/tool netwatch disable [ find comment=\"\
    LTE\" ]"
add dont-require-permissions=yes name=Use_LTE_Last owner=admin policy=\
    read,write,test source=":log error \"Use_LTE_Last says Next_ISP is down\";\
    /ip route set distance=4 [ find comment=\"Next_ISP\" ];/ip route set dista\
    nce=2 [ find comment=\"LTE_Last\" ];/tool netwatch enable [ find comment=\
    \"LTE\" ]"
add dont-require-permissions=yes name=LTE_running owner=admin policy=\
    read,write,test source=":log error \"LTE_running says LTE_Last is up\""
add dont-require-permissions=yes name=LTE_fail owner=admin policy=\
    read,write,test source=":log error \"LTE_fail says LTE_Last is down\""

/tool netwatch
add comment=Primary disabled=no down-script=Use_Next host=1.1.1.1@vrf1 \
    interval=10s timeout=5s type=simple up-script=Primary_running
add comment=Next disabled=yes down-script=Use_LTE_Last host=4.2.2.1@vrf2 \
    interval=10s timeout=5s type=simple up-script=Next_running
add comment=LTE disabled=yes down-script=LTE_fail host=4.2.2.2@vrf3 interval=\
    10s timeout=5s type=simple up-script=LTE_running

The netwatch scripts I used are extremely simple and (IMHO) far from smart or elegant in any way, yet they seem to work just fine.

As soon as I started congratulating with myself for having managed to half-@§§edly put together something working, the new Ax Lite's I ordered came in and (even if it took more time than expected) I succeeded to replicate, adapting it to the 4 ports, the settings, and the next problem came out:
it seems that there is no way to make the Mikrotik device to perform DNS resolution through a vrf (if there is a way I couldn't find how to have it working in this setup), which means that you don't have NTP as well (it seems like you cannot even try using directly an IP address for a NTP server, you need to use its name and let the router resolve it through DNS).
For the same reason also online update to 7.12.1 (from the factory installed 7.8 version ) wasn't possible.
I had to make a couple routes in main (disabling/removing one of the VRF's) to be able to have DNS (and consequently NTP and software update) working.

BTW and only as a side note, I found out that (for whatever reason) the "resolve" command - while throwing an error if no DNS is found - outputs nothing if used normally on command line, and needs to be invoked *like*:
put [:resolve google.com]
or :
put [:resolve domain-name=google.com server=8.8.8.8]

Nothing to make a fuss about, but using "resolve google.com" or "resolve domain-name=google.com", etc. should throw an error or a warning and not a "newline".

In this specific case, one could be happy with the router set to any random date/time (even if going back to the '70's seems a bit of a stretch) and as well the resolving through DNS may be unneeded, still it looks "wrong".

I am not finished with this.

I am experimenting with removing the VRF1 (moving the Primary ISP connection to "main") and creating a new VRF on the LAN side.

So far it seems to be working , but I have to do more tests to be sure.

Re: Attempting to evolve from caveman's failover

Posted: Thu Dec 21, 2023 6:24 pm
by jaclaz
And this is the 3 WAN failover with "reversed" VRF and no netwatch/no scripting (using recursive).

The reproducible in CHR:



/ip vrf
add interfaces=ether8 name=vrf8

/ip address
add address=192.168.1.241 interface=ether1 network=192.168.1.1
add address=192.168.1.242 interface=ether2 network=192.168.1.1
add address=192.168.1.243 interface=ether3 network=192.168.1.1
add address=192.168.1.1/24 interface=ether8 network=192.168.1.0


/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=\
    192.168.1.241
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=\
    192.168.1.242
add action=src-nat chain=srcnat out-interface=ether3 to-addresses=\
    192.168.1.243

/ip route
add dst-address=192.168.1.0/24 gateway=ether8@vrf8
add check-gateway=ping comment=main_ISP1_Route disabled=no distance=1 \
    dst-address=0.0.0.0/0 gateway=1.1.1.1@vrf8 routing-table=main scope=30 \
    target-scope=11
add check-gateway=ping comment=main_ISP2_Route disabled=no distance=2 \
    dst-address=0.0.0.0/0 gateway=4.2.2.1@vrf8 routing-table=main scope=30 \
    target-scope=11
add check-gateway=ping comment=main_ISP3_Route disabled=no distance=3 \
    dst-address=0.0.0.0/0 gateway=4.2.2.2@vrf8 routing-table=main scope=30 \
    target-scope=11
add comment=vrf8_ISP1_Ping disabled=no distance=1 dst-address=1.1.1.1 \
    gateway=192.168.1.1%ether1 routing-table=vrf8 scope=10 target-scope=10
add comment=vrf8_ISP2_Ping distance=2 dst-address=4.2.2.1 gateway=\
    192.168.1.1%ether2 routing-table=vrf8 scope=10 target-scope=10
add comment=vrf8_ISP3_Ping distance=3 dst-address=4.2.2.2 gateway=\
    192.168.1.1%ether3 routing-table=vrf8 scope=10 target-scope=10
add check-gateway=ping comment=vrf8_ISP1_Route disabled=no distance=1 \
    dst-address=0.0.0.0/0 gateway=1.1.1.1@vrf8 routing-table=vrf8 scope=30 \
    target-scope=11
add check-gateway=ping comment=vrf8_ISP2_Route disabled=no distance=2 \
    dst-address=0.0.0.0/0 gateway=4.2.2.1@vrf8 routing-table=vrf8 scope=30 \
    target-scope=11
add check-gateway=ping comment=vrf8_ISP3_Route disabled=no distance=3 \
    dst-address=0.0.0.0/0 gateway=4.2.2.2@vrf8 routing-table=vrf8 scope=30 \
    target-scope=11
/system identity
set name=VRF_253_2nd


The routing looks a little bit crazy (IMHO) but it seems to be working:
[admin@VRF_253_2nd] /ip/route> print
Flags: D - DYNAMIC; A - ACTIVE; c - CONNECT, s - STATIC; + - ECMP
Columns: DST-ADDRESS, GATEWAY, DISTANCE
#      DST-ADDRESS     GATEWAY             DISTANCE
;;; main_ISP2_Route
0   s  0.0.0.0/0       4.2.2.1@vrf8               2
;;; main_ISP1_Route
1  As  0.0.0.0/0       1.1.1.1@vrf8               1
;;; main_ISP3_Route
2   s  0.0.0.0/0       4.2.2.2@vrf8               3
3  As  192.168.1.0/24  ether8@vrf8                1
  DAc+ 192.168.1.1/32  ether3                     0
  DAc+ 192.168.1.1/32  ether2                     0
  DAc+ 192.168.1.1/32  ether1                     0
;;; vrf8_ISP3_Route
4   s  0.0.0.0/0       4.2.2.2@vrf8               3
;;; vrf8_ISP2_Route
5   s  0.0.0.0/0       4.2.2.1@vrf8               2
;;; vrf8_ISP1_Route
6  As  0.0.0.0/0       1.1.1.1@vrf8               1
;;; vrf8_ISP1_Ping
7  As  1.1.1.1/32      192.168.1.1%ether1         1
;;; vrf8_ISP2_Ping
8  As  4.2.2.1/32      192.168.1.1%ether2         2
;;; vrf8_ISP3_Ping
9  As  4.2.2.2/32      192.168.1.1%ether3         3
  DAc  192.168.1.0/24  ether8@vrf8                0


Adding the google dns servers:
/ip dns
set allow-remote-requests=yes servers=8.8.8.8,8.8.4.4

DNS queries resolve fine:
[admin@VRF_253_2nd] > /put [:resolve google.com]
142.250.180.174

As well adding a NTP server:
/system ntp client
set enabled=yes
/system ntp client servers
add address=ntp1.inrim.it

NTP is fine:
[admin@VRF_253_2nd] /system/ntp/client> print
         enabled: yes
            mode: unicast
         servers: ntp1.inrim.it
             vrf: main
      freq-drift: 0 PPM
          status: synchronized
   synced-server: ntp1.inrim.it
  synced-stratum: 1
   system-offset: -2.956 ms


Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 5:49 pm
by jaclaz
Ok, so it is more than two months that the setup described in post #37 is up and running.
It did work in all simulations/tests I made and it has worked just fine in one instance where the ISP1 was down for a few minutes, performing the failover to ISP2 nicely and then going back to ISP1 when service returned.

Today there was the need for some work to be done on the electrical line, there was a planned interruption of energy, 15-20 minutes planned, to change some cables/wires outside the building.

As a precaution we shut off all the PC's, just in case, but I left the "router stuff" connected (no UPS, I know), so when the electric company switched off the line routers/modems/switches went off.

When the energy was restored, I checked the internet connection and although all the devices were powered on and running, there was no internet.

Since the ISP1 modem is very slow at booting (while the ISP2 modem is much faster) I thought that the issue was with failover, ISP2 modem had already all the right lights on while the ISP1 one was still detecting the DSL/handshaking/whatever pinging 8.8.8.8 from a PC gave "host unreachable" (whilst pinging the Mikrotik hap ax lite lan address was fine).

So I connected to the hap ax lite with Winbox, and it had rebooted just fine (with the expected "router rebooted without proper shutdown, probably power outage" in log).
I then tried from terminal to ping 8.8.8.8 and for 6 or 7, maybe 8 pings it gave to me "no route ...", then it suddenly started responding normally and (obviously) internet was restored to the whole network.

I made a few attempts at manually power down/reboot the device and - checking the routes - all of them ( that should have normally been "AS" or "S") were "USHI" (as if there was no cable connected to the WAN interfaces).

So, in terminal, tried again one single ping at the time, while keeping an eye on Routes window open:
ping 8.8.8.8 count=1
and, lo and behold, at the 8th ping (after 7 "no route ..." replies) ping succeeded as normally and the routes became AS or S.

No idea why exactly this happens, maybe it is "normal", but it doesn't sound so (at least to me).

For the moment I added a Netwatch script on 8.8.8.8 that only logs if ping is successful ("up" script) and logs if ping is unsuccesful AND pings 8.8.8.8 10 times ("down" script).
The 10 pings change the status of the routes, so at next run of netwatch it succeeds.

I could only make a couple tests (as people without internet were starting to get mad at me), but it seems to work, even if it seems - besides primitive - also somewhat "ugly".

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 6:34 pm
by anav
Simple question what if one was to use this in routes.....
/ip route
add distance=2 dst-address=0.0.0.0/0 gateway=192.168.1.1%ether1 routing-table=main comment="RouteStarlink"
add distance=3 dst-address=0.0.0.0/0 gateway=192.168.1.1%ether2 routing-table=main comment="RouteOrange"
add distance=4 dst-address=0.0.0.0/0 gateway=192.168.1.1%ether3 routing-table=main comment="RouteBlue"


From what my reliable yoda told me, since I cannot find sheite in MT docs........ aka what the @symbor or %symbol do in IP routes .....................
That if there is a single LAN, the above method should work.
Using VRF, is adding Virtual Routers, not vlans, to the mix and this works if you have three separate LANs as the LAN and WAN have to be in the same VRF.
Looking at the examples, at MT, VRF is used for vpns and single uses, not for LAN distribution.

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 7:37 pm
by jaclaz
My gateways are different from what you suggest:
;;; vrf8_ISP1_Ping
7 As 1.1.1.1/32 192.168.1.1%ether1 1
;;; vrf8_ISP2_Ping
8 As 4.2.2.1/32 192.168.1.1%ether2 2
;;; vrf8_ISP3_Ping
9 As 4.2.2.2/32 192.168.1.1%ether3 3
because of recursive failover, but they point in the same direction.

And BTW the ether1-3 are, in this "reversed VRF" approach, outside the VRF.

I have a LAN and 3 WANs.

Anyway the issue must lie somewhere else, a Route "USHI" is the same as the one for ISP_3 (which I haven't - yet -connected in the "real" installation), for this latter I have (normal since there is no cable in the socket for ether3) the led off, for ether1 and 2 (and of course 4) I have the leds lit up.

It is like the ether1 and 2 "go to sleep" and need be awaken by *something", very likely besides the ping enabling/disabling the interfaces (or maybe also disabling/enabling the routes) would do as well.

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 7:47 pm
by anav
Well when you post the complete config I can comment. Not sure why you are using VRFs at all, yet.
Assuming its 3 ISP modems into one router.

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 7:53 pm
by jaclaz
Well when you post the complete config I can comment. Not sure why you are using VRFs at all, yet.
Assuming its 3 ISP modems into one router.
The configuration has already been posted in post #37 which is the setup we are talking about, which includes the configuration AND a diagram, no need to assume anything.

The ONLY difference between what is posted, reproducible in CHR, and the real install is that in the latter ether4 is used instead of ether8 and the vrf is called vrf4 instead of vrf8.

The three "upper" router are there in the CHR diagram just to simulate the three ISP routers, for all that matters they are three (in real installation at the moment only two) "black boxes" with 192.168.1.1 on the "inner" side and internet on the "outer" side, I had to add them to be able to do tests with disconnection of cables/power down/etc. as the "cloud" in CHR is "always" on.

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 8:49 pm
by anav
As I stated, thus far no reason to use VRF has been provided and as a matter of fact it would seem NOT appropriate in this case.
Further, your recursive is incorrect.

Simple solution works:

/ip address
add address=192.168.1.241 interface=ether1 network=192.168.1.1
add address=192.168.1.242 interface=ether2 network=192.168.1.1
add address=192.168.1.243 interface=ether3 network=192.168.1.1


/ip route
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=1.1.1.1 scope=10 target-scope=12
add distance=2 dst-address=1.1.1.1/32 gateway=192.168.1.1%ether1 scope=10 target-scope=11
++++++++++++
add check-gateway=ping distance=4 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=4 dst-address=9.9.9.9/32 gateway=192.168.1.1%ether2 scope=10 target-scope=11
+++++++++++
add check-gateway=ping distance=6 dst-address=0.0.0.0/0 gateway=8.8.8.8 scope=10 target-scope=12
add distance=6 dst-address=8.8.8.8/32 gateway=192.168.1.1%ether3 scope=10 target-scope=11

/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=\
192.168.1.241
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=\
192.168.1.242
add action=src-nat chain=srcnat out-interface=ether3 to-addresses=\
192.168.1.243

Then I imagine, one could assign any specific Routing rules or mangles as required using the below setup.

add fib name=use-ISP1
add dst-address=0.0.0.0/0 gateway=192.68.1.1%ether1 routing-table=use-ISP1

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 9:07 pm
by jaclaz
Well, it didn't work when I tried it, as ether8 in the simulation (ether4 in the real setup) have the same address 192.168.1.1 (same network) as the ISP routers, that is the reason why a VRF or the other approach by mtest01 were tested, compare also with:
viewtopic.php?p=1039688

If there is a way to route on a same device between ether4 192.168.1.1 (LAN) and ether1 192.168.1.241, ether2 192.168.1.242, ether3 192.168.1.243 (WAN) with gateway for each of these three ports 192.168.1.1, without using VRF or other complex trickery, I am all ears.

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 9:19 pm
by anav
Yes, there is, provided above. Probably it didnt work because your recursive was not correct.

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 11:00 pm
by jaclaz
Recursive has nothing to do with the issue I had, at least one route should have not been USHI.
And at the time, before trying with VRF and with the other method using proxy ARP, I did try the "plain" setup you propose with just one route/one ISP connection and it didn't work.
However when I will have some spare time I will try again.
If you have the time/will can you explain what exactly is "wrong" in the recursive I used? (I have tested both in the simulation and on real setup and it worked just fine).

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 11:03 pm
by anav
Your discriminatory skills are weak, hint. look at SCOPE!

Re: Attempting to evolve from caveman's failover

Posted: Mon Mar 18, 2024 11:29 pm
by jaclaz
Undoubtedly I miss the discriminatory skills you are gifted with or that you gained through years of experience.
Still no idea what the mistake (if any) is but - again - the setup works as is and even if the failover does not work (but It does work) at least one route should have been AS after the reboot and the other ones S, none should have been USHI.
The three routes for the DNS servers pointing to %ether1, %ether2 and %ether3 are normally all AS and should return to that state after reboot, or at least one of them should, as long as there is a cable in the interface and the modem on the other side of the cable Is working.

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 12:03 am
by anav
Man do I have to state it in writing, your SCOPES are wrong!! LOL
The config I gave works, its your config that is broken if it doesnt.

I cannot read a winbox jpeg unless its very clealry delineated
An RSC script I can read in seconds................... its just a story about requirements
I cannot make up scripts for automation, aka in system scripts, or netwatch, worth a can of beans...
I can vlan wlans with eyes closed until capsman enters the picture and then I give up.

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 12:38 am
by jaclaz
Ok, thank you anyway for the help and suggestions.
This conversation seems to have no SCOPE anymore (pun intended).

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 2:38 am
by anav
No one can force you to actually fix your config, that motivation has to come from within............

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 11:23 am
by jaclaz
Please, do not misunderstand me.
I posted about an issue that I experienced.
I want to understand the reason why that issue happened.
Then, I am of course very open to any constructive suggestions that may help in making a better configuration.
But you started posting snippets that - without an explanation - make no sense to me, very likely your approach works with people much more experienced than myself, but not with beginners like myself.
I do understand how you may have no time/will (or patience) to teach things that - for you - are well understood and easy, but it is not at all useful to me to get partial snippets and been shout at because I don't understand them and don't want to blindly make changes to a working configuration before having grasped the concepts.
As such there is no point in continuing asking for clarifications/explanations that you are not going to give.

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 1:05 pm
by mtest001
I then tried from terminal to ping 8.8.8.8 and for 6 or 7, maybe 8 pings it gave to me "no route ...", then it suddenly started responding normally and (obviously) internet was restored to the whole network.
Not sure to understand the magnitude of the problem here, loosing 7 pings after a complete loss of power is no big deal in my opinion.

Maybe you could try to tune your netwatch parameters: interval=10s timeout=5s

In the worst case scenario it will take 10 seconds + 5 seconds to execute the fail-over script. And in my experience Netwatch takes much longer to react. I have the same settings as you and I noticed that it takes 45 seconds to 1 minute to switch over. My assumption is that Netwatch does not handle well the situation when there is no route to the host being monitored. as opposed to "there is a route but the ping times out".

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 2:04 pm
by anav
Well I am surprized because you went down the rabbit hole of a very complex VRF config on one hand, but then seemingly want to avoid like the plague a KISS solution ???
In any case, if there is no issue requiring solving I will move on.

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 3:21 pm
by mtest001
The use of the VRFs is there for a different reason: both ISP1 and ISP2 routers have the same IP i.e. 192.168.1.1

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 3:30 pm
by jaclaz
@mtest001
In the configuration I had, there was no Netwatch script running, I added the described one later as a (temporary for the moment) fix for next occasion.
Pinging from a device on the network seemed to have no effect "host unreachable".
No idea if - giving it some time - the check gateway would have done something, but that should ping every 10 seconds and more than 70-80 seconds had passed, I suspect since the routes where the check-gateway checks are were USHI there was no attempt at all from them.
BTW the 7/8/10 pings to 8.8.8.8 are "generic", i.e. no interface is speciified in the command.
The issue is that I cannot seem to reproduce in gns3/CHR and testing on the "real" device is complicated because of the various internet users I have around that seem all unable to be disconnected for longer than a few minutes.
I will have to take the spare Ax lite I have and experiment with it on the desktop.
The timing for the recovery in itself is not a problem, as the "main" ISP1 router is slow anyway, and even the secondary one ISP2, while faster, is not a rocket, so that the reboot of the Ax lite is faster than both, hence the use - for now - of Netwatch, the cycle for that script is 30 seconds, so in case of failure there is the pinging and - provided that it restores the routes as AS - the re-connection will happen at the most in some 40 seconds (the fact that the next run of netwatch will happen some 20 seconds later is irrelevant, as that will only acknowledge that the connection has been restored.
What I don't know (didn't have time to test) is whether the pinging needs to actually succeed or if it is enough to "wake" the routes even if the ISP modem(s) are not (yet) ready.
Typically these modems respond to pings to them (192.168.1.1) very soon, then it takes them from several seconds to a few minutes to actually have a connection established and allowing pinging to 8.8.8.8, but the no route and the USHI should mean that not even the modem in itself is detected so maybe the pinging works even before the connection is established.

@anav
I went through this rabbit hole only because I couldn't found any reference to a complete, working, simpler configuration, while I found endless examples/posts by people stating that routing within a same set of network addresses is not possible.
If you happen to have a reference to a complete example (or wish to produce one) please post it.
I will re-state, just in case, that the original intended use case is the following:
1) I have three ISP devices (one "main", ISP1, and two secondary and tertiary, ISP2 and ISP3) and all of them have a fixed address of 192.168.1.1(/24)
2) my usual way to deal with the "main" router/modem having not internet was to unplug the RJ45 from the ISP1 and plug it in ISP2 (the procedure if also ISP2 was down and ISP3 was needed was slightly different as the LTE modem is in another room, so let's reduce the problem to only ISP1 and ISP2)
3) the devices on the network (LAN) have all a static address in the 192.168.1.0/24 range and cannot be changed, with gateway set to 192.168.1.1
4) the role of the device (currently an AX lite) is to have a LAN port with address 192.168.1.1 (/24) on ether4 (LAN) and three *whatever* addresses on ether1/ether2/ether3 each capable of routing to a destination of 192.168.1.1 ( the three ISP modems) and from there to the internet.

As soon as I have time, I will try to pinpoint what happens in this newly discovered hiccup when a loss of power is involved, once I will have the problem more exactly described (and hopefully a better solution than the current "ugly" workaround) I will be able to try again other (possibly) simpler approaches.

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 4:31 pm
by mtest001
BTW the 7/8/10 pings to 8.8.8.8 are "generic", i.e. no interface is speciified in the command.
The pings can be forced through a specific VRF. Try it next time to confirm if the ping fails for 1 VRF and works for the other.
The issue is that I cannot seem to reproduce in gns3/CHR and testing on the "real" device is complicated because of the various internet users I have around that seem all unable to be disconnected for longer than a few minutes.
Find a maintenance window and tell your users. Try somehing like very early in the morning like 5 or 6 AM or during week-ends.

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 5:14 pm
by jaclaz
There is only one VRF and it has only the ether4 (LAN) port (ether8 in the simulation).
What I was trying to say is that in "normal" operation I have:
the three routes main_ISPn_Route: #1 AS and #2 and #3 S
the three routes vrf8_ISPn_Route: #1 AS and #2 and #3 S
the three routes vrf8_ISPn_Ping: all three AS <- and when one or two of these go down the above routes will switch or change status.
In any case, in any situation of the modems/connections they are all either S or AS, while when the hiccup happened they were all USHI.
The generic ping restored them all (not just the set through which the ping must have gone through), this is what makes me think that it is something more like a "freeze" or "sleep" of the whole "routing stuff" and the pings acted as a wake signal.
But as said I only had a quick look before implementing the workaround and giving back the internet to the hungry, and I had/have some other things to do, I will have sone time to delve into the matter in ten days time.

And thanks, I know the "trick" of working while people sleep, but the tests/experiments I need to do may take their time, it is much better if I do them in a (this time physical) simulation, so I have all the time needed, only once I have something working I will actually put the new configuration in operation during one such maintenance slot.

Re: Attempting to evolve from caveman's failover

Posted: Tue Mar 19, 2024 7:17 pm
by anav
The use of the VRFs is there for a different reason: both ISP1 and ISP2 routers have the same IP i.e. 192.168.1.1
Use of VRFs is warranted if if each router is serving different subnets.
VRF is NOT warranted for distribution to the same LAN ( regardless of number of subments )

The simple method presented works just fine, so I am not clear as to why you would want to attempt a complex and incorrect approach.