On what in particular? I assume, as you talk about bonding, that you want your radio links to be L2 transparent, so no routing and associated failover mechanisms can be used. On bonding, that post says the same that I do, doesn't it?What is your opinion?
"Factory defaults -> home CPE" is a good start. "No defaults at all" should be manageable using Winbox connecting to MAC address or mac-telnet from another Mikrotik, and also recoverable by resetting to factory defaults using the reset button during power-on, but that's off-topic here.So, some questions:
What baseline should I be starting from? Factory defaults Router? Factory defaults Bridge? Or no defaults at all (which I have not tried due to many nightmare stories of folks who had to use a serial cable to recover from that)?
The part about NAT is important. The 750 gets its own WAN IP address and default route from the "backhaul IP", whatever it may be, but unless that "backhaul IP" thing knows that packets for 192.168.1.0/24 and/or 192.168.88.0/24 have to be sent to the 750's WAN IP, the action=masquerade rule on the 750 is necessary, otherwise the packets from its LANs are routed via the WAN but with their original source address, so in better case the "backhaul IP" drops them already on their way to the server somewhere in the internet, in worse case they reach the server but the server sends it response is to 192.168.1.x (or 192.168.88.x) which is a completely unrelated device in the server's network.Leave the default IP's as is (88.1). Turn off DHCP and NAT from QS.
...
All is well after reset. Repeating my steps above, I lose Internet when I turn off the DHCP server. DHCP client still sees the ISP and can release/renew OK. Pings from connected machine don't work (obviously).
The connected machine still has an IP from when DHCP was turned on, and it is configured correctly per Connection Information. Assigning an IP manually with the same parameters but using a different 88.x IP and using that does not make a difference.
If the action=masquerade rule is set to match on the WAN port and on a list of LAN bridge ports at the same time, it cannot work at all. But matching on bridge port is only possible if use-ip-firewall is set to yes under /interface bridge settings, so the rule may be auto-disabled.I believe it was the same with the addition of an entry in the "Out Bridge Port List", which I believe was set to all LAN. I will need to check that next time I am down there.
Exactly. The DHCP is used to set up IP address and network mask at least, but there are many additional parameters usually used - default gateway, dns server list, NTP server list, configuration server fqdn, configuration server name.."give the 750's IP address in the respective subnet as the defalt gateway to the client device(s) in 192.168.1.0/24 and/or 192.168.88.0/24, set those devices to use 8.8.8.8 as DNS, and try again."
So, this needs to be done on end-user devices such as laptop, tablet, etc.? Or am I misunderstanding? In other words, do MANUALLY what a DHCP server would do AUTOMATICALLY?
I'm not sure why we are doing this - it was your complaint that the devices behind the bond cannot reach internet after the changes you've done. My initial understanding was that the bonding setup was the only thing which was not clear to you, but now it seems to me that there is much more. That's why I've initially concentrated on the migration from a single cable connection to the bonded one.Are we doing this only to test the ability of a device to get Internet access via the 750 when everything is properly configured?
Even that way, the way with two interconnection subnets and redundant routing seems better to me than any L2 solution (bonding or mesh), as you can try several strategies for loading the two links and stick with the optimal one, and none will be worse than bonding.a properly configured 750UP will take the place of the unmanaged switch at the actual site.
It seems my mind reading skills are severaly impaired these days. As I haven't seen the 750 anywhere on the drawing, and as you stated before the 750 will have the public IP on itself, I've thought the first box named ISP was the 750 ☹So, you are saying that "two interconnection subnets and redundant routing" can be implemented with the network topology as-is, without needing to change any hardware at the "source" site? Or am I misunderstanding the meaning of "it is still possible to implement the above in steps from one end"? Perhaps that means that AFTER a 750 is installed at the source end it can all be configured from a single point.
Basically:It would be cool if I didn't need to go to the source end. Setting up a site access visit there requires getting in contact with a fairly non-responsive and grumpy County IT employee to get into the building and driving through some really nasty deep sand. I wasn't going to get around to that until next year.
I've assumed that the yesterday's diagram was describing the actual current deployment setup. The suggestion on the drawing reflects that assumption - it can be implemented on the equipment in production, not affecting its operation until ready to start using the new approach, and keeping the management via 192.168.1.x like it is now so always keeping the possibility to revert the steps without need to rely on safe mode.When I said "where I am with things currently" in regards to the network diagram, that refers to the situation at the actual site NOT the "bench test" collection.
Now I'm a little bit uncertain - is it 350 miles north of your current location or of Phoenix? I mean, it is easy to migrate the configuration safely if you are physically present at the 960 end, but it requires a bit more planning and a scheduled rollback script for the final switchover step (which I haven't anticipated initially - see below) if done from the internet end.The network deployment is about 350 miles north
That's not a big deal as you can set any MAC address on Mikrotik devices' Ethernet interfaces. The only complication is when you do that from the internet side, as you need to change it at both the 960 and at the NetBox almost simultaneously. I didn't expect this as your second drawing was showing the public IP on the ISP box, but doesn't change much about the concept. The red subnet becomes a public one (216.169.x.x), so the NAT handling moves from the 960 to the NetBox, whilst the firewall may stay at the 960.My IP is locked to the MAC address of the RB960 at the remote network end. I have had some discussions with my provider on this - it can be changed, but I can have only one instance of MAC address at any given time.
As said above, bonding can be used, but due to its design, a lot of extra stuff would have to be configured to maintain management access to the long-distance radios, and the overall performance would be inferior to the one of my proposed solution with routing. Leaving aside the fact that you don't need to physically add the 750 to the network if you take the routing-based way. The price to pay is that you have to learn a bit.I do like the idea of leveraging what I already have in place.
...
What I was hoping for was a way to provide redundancy on the long-distance piece as well as keep WDS transparency on the entire link so the 960 could keep plugging along pretty much as-is. That seemed to be the simplest method to this admitted novice, and one that I could administer with my current level of knowledge.
Yes, for the routing-based failover to work, the public IP has to be at the NetBox. And as said above, you don't need to talk to the ISP, it is enough to configure the 960's MAC address on the interface of the NetBox.I will need to call my contact at the ISP to inform them of the new MAC that my IP will bind to as it appears that could now be the NetBox and not the RB960. (Maybe. Still examining your diagram and figuring things out. If that is incorrect, let me know.)
It doesn't matter what equipment in particular is used for the radio links. You can use the testbench 960 instead of the production 960, and the testbench 750 instead of the production NetBox. The radio links on the testbench will be just L2 transparent like the real ones. The NetBox will have one ethernet port with the dumb switch on it and one wireless port bridged together, whereas the 750 will have three wired ports bridged together and to two of them an external radio will be connected, that's all the difference. At L3 (IP addresses and routing), the settings will be the same.UBNT gear does have a Router mode, although I have never used it, preferring to leave the routing to Mikrotik.
So, given the above, can I still create a bench setup with the hardware I have? I suppose I could buy a couple of cheap hAP's so we have Tik devices to work with. I always have a use for those anyway.
Again, partitioning the network using VLANs is not mandatory - you currently run the 192.168.1.0/24 and the public subnet in the same LAN and it causes no trouble as the ISP doesn't have their own 192.168.1.0/24 in the same LAN. So the replacement of the Bullet2s can be easily postponed to never, or just the dumb switch may be replaced later on by the 750 you already have, to provide the tagging/untagging capability to the Bullet2 path externally.Sadly, the old non-M Bullets don't support running VLANs. So I will need to buy stuff in order to complete the bench test setup.
Assuming that the weather conditions deteriorate and improve gradually, yes, it is a very good idea to control the failover based on receive level on the link. However, I'd prefer hysteresis to time-based switchover. Say, a fall of the Rx level below -70 dBm means "stop using this link", but a raise above -60 dBm is required to start using it again. You'll need to collect some data to find out what is the fluctuation of the Rx level during normal weather conditions.Is it possible to have a script run periodically on the destination end 960 that would monitor the signal level or the data rate of the destination end NetBox (via Wireless>>Registration) and switch to the secondary link if it falls below a certain number?
...
Just wondering if it's even possible.
I'm not sure whether 15 min intervals are sufficient to notice short-time fluctuations caused by the packet nature of the link. With a traditional full duplex link where the carrier is always on I would have no doubt, but with this packet thing, I can see Rx level to drift in a range from -80 to -77 dBm within seconds on a link of about 50 feet with omnidirectional antennas. So some averaging may be required, which would make the script a little bit more complex (and its response a little bit slower, but that's not critical).a scheduled script to record signal level every 15 min
Initially I did talk about the same thing, monitoring of transparency of the path all the way to the internet. But then I've realized that this only makes sense when the two WAN interfaces are connected to paths which reunite many hops away from the device (i.e. each goes through a different ISP, and you want to know that the path through that ISP to the "big internet" is transparent as a whole).A quick search term found this:
https://itimagination.com/mikrotik-wan- ... -reliable/
This sentence there has me thinking you are describing the same thing:
"With these route-based rules, failover times are about 15 seconds. From the time internet connectivity stops, to failing over, to workstations regaining internet access, is about 5-15 seconds. From testing, failing back to primary is a little quicker, maybe 5 seconds."
As said above, there will be no code "running" on the 960. Just two routes towards 0.0.0.0/0 configured, the preferred one with distance=1 and check-gateway=ping, and the backup one with distance=2, each with another IP address as gateway. The same configuration, except that the destination of both these routes will be just 192.168.0.0/24, at the NB5 at the source side (actually, to provide management access to all the equipment at the 960 end via both links, there will be some more routes but that's not important for the principle).From your description near the end of your post, it *sounds* like there will be code running on the destination end NB as well as the RB960. Code on the NB5 monitoring signal conditions and blocking pings (which originate at the 960) when the signal falls below the preset value, with more on the 960 that sends the pings, detects the loss of them, and when they are lost, invokes a different route to the secondary link. And of course the reverse operation when sufficient signal is again available. If we get the Blizzard of the Century and both links stop passing data, it sounds like traffic will switch to the secondary link and just stay there until sufficient signal returns to the primary. Very good. If I am understanding this correctly, this approach will also be "device-agnostic" regarding the secondary link in that I can swap different devices in for the old B2HP's at any time with no re-config necessary.
Re-stating what you've heard/read in your own words is a great way to confirm that both parties understand things the same way.I am probably only re-stating what you've already said in a different way - with included mistakes due to my lack of (but increasing) knowledge. But I really want to understand this.
Yes, scroll a few posts back :) That's what I've stated as the advantage already back then, that you can do all the changes while connected to the 960 alone (provided that you can reach the NB5 at the remote end from there of course).It also sounds like this can all happen with no need for me to make a visit to the source end. True or false? Another benefit, if true.
There may still be a point in having the LTE there as a backup of management access.I have pretty much had to abandon LTE as backup as the network usage has increased such that I can't afford $10/Gb (!!) any longer.
I remember this setup, that's why I keep repeating that you currently use two subnets (the management one, 192.168.1.0/24, and the public one, 216.x.x.x) in the same L2 segment (or bridged domain if you prefer to call it that way). And I rely on the existence of this management access via addresses in 192.168.1.x/24 to all the devices whose configuration needs to be augmented to implement the routed failover.Speaking of management access, remember that you coached me through getting that to work last year, and it required this /IP Address entry:
...
as previously mentioned in post #10. Will this affect or be affected by what we are discussing? Just didn't want this to swoop in at the last minute and hose things.
:local monitor [/interface wireless monitor wlan1 once as-value] ; :put (($monitor->"signal-to-noise") . " " . ($monitor->"signal-strength"))