I replied to you in that thread - I have not noticed any problems with the operation of WiFi access points in AC/AX bands. I use CAPsMAN.Any fixes for WiFi disconnects on ax devices?
viewtopic.php?t=208199 this is annoying bug, not allowing to use any of new firmwares released
It looks like the problem is only for Intel AX devices like (ax200/ax210 wifi cards)I have not noticed any problems with the operation of WiFi access points in AC/AX bands. I use CAPsMAN.
What issue is this? Is there an upstream bug report you can link?We have found that there is a linux kernel driver issue with intel ax201/ax210 cards, that exists in all linux based operating systems, we are trying to find ways to handle these clients better, so if you can reliably repeat these issues and you have this card, send us a supout.rif file to support
wpa3 + intel ax
What issue is this? Is there an upstream bug report you can link?
I've tried various configurations from support guy recommendations and neither of them worked. The configuration is identical to yours. So it depends on many factors, someone is lucky, someone not.I replied to you in that thread - I have not noticed any problems with the operation of WiFi access points in AC/AX bands. I use CAPsMAN.Any fixes for WiFi disconnects on ax devices?
viewtopic.php?t=208199 this is annoying bug, not allowing to use any of new firmwares released
I've sent a bunch of supout files from 7.14.1, 7.15 and 7.15.1 to support through email. I'm on Intel AX201We have found that there is a linux kernel driver issue with intel ax201/ax210 cards, that exists in all linux based operating systems, we are trying to find ways to handle these clients better, so if you can reliably repeat these issues and you have this card, send us a supout.rif file to support
This seems to also reproduce without wpa3 (wpa2 only) for me.wpa3 + intel ax
What issue is this? Is there an upstream bug report you can link?
https://bugzilla.kernel.org/show_bug.cgi?id=203709
I see that as well (CHR with 128MB storage on server, 71MB free inside RouterOS)Bug :
Total HDD Size = 0 Kib
I can confirm this at x86 machine. Total HDD Size = 0 KibUpgrade from 7.16beta7 to 7.16rc1
Bug :
Total HDD Size = 0 Kib
And I was waiting for a rewrite of the syslog system to follow rfc 5424 that was released in 2009!!!Was waiting to see the igmp-proxy issue found on SUP-152693 that affects IPTV Movistar Spain fixed in 7.16
Yes, I hope they provide a solution quickly. Movistar TV in Spain is pixelated.Was waiting to see the igmp-proxy issue found on SUP-152693 that affects IPTV Movistar Spain fixed in 7.16
[admmikrotik@router70] /ip/dns> adlist/print
Flags: X - disabled
0 url="https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts" ssl-verify=no match-count=317 name-count=161549
[admmikrotik@router70] /ip/dns> print
....
cache-size: 40960KiB
cache-used: 18286KiB
....
[admmikrotik@router70] /ip/dns> adlist/disable numbers=0
[admmikrotik@router70] /ip/dns> adlist/print
Flags: X - disabled
0 X url="https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts" ssl-verify=no match-count=317 name-count=161549
[admmikrotik@router70] /ip/dns> print
....
cache-size: 40960KiB
cache-used: 18284KiB
....
Did your BGP config survive during update? Do you have multiple routing tables?Routing, BGP and DHCP client on 4011 works well.
/interface wireless cap
set bridge=bridge-boss discovery-interfaces=bridge-boss enabled=yes \
interfaces=wlan2,wlan1
/interface wireless cap
set bridge=bridge-boss discovery-interfaces=bridge-boss enabled=yes interfaces=wlan2,wlan1
#line 34..35
add add-default-route=special-classless default-route-distance=88 interface=\
vlan88-backhaul script=":if ( \$bound != 0 ) do={\r\
#line 36
\n # Attempt DNS update, in case Internet is available now\r\
Script Error: expected command name (line 36 column 5)
Yes and yes, but just 3 tables.Did your BGP config survive during update? Do you have multiple routing tables?
On mine, the BGP config was lost together with multiple routing table config.
The Intel AX adapters are hot garbage. Last week I literally had a laptop with an AX200 card sitting next to one with a Realtek 8852CE. The Realtek was connecting to my cAP ax in the next room at 1201 Mbs (max rate) and the AX200 was connecting at 300 Mbs. It also randomly decides to connect to 2.4Ghz on a far AP even when sitting underneath a cAP ax, which requires turning WiFi off and back on to fix. This is in Windows 11 23H2 connecting to cAP ax units running ROS 7.15.1.It looks like the problem is only for Intel AX devices like (ax200/ax210 wifi cards)
Only active routes are in the FIB.But how can I print the differences between RIB and FIB?
I was talking about linux kernel in the mikrotik AP, not the client side.Well, it is probably not just a Linux issue with these Intel AX cards. Kaldek is running Windows. It is more likely their Intel device firmware (firmware is proprietary binary regardless of OS/platform AFAIK)
hum, 🪃 of using dated and EOL 5.5.x kernel.😶I was talking about linux kernel in the mikrotik AP, not the client side.Well, it is probably not just a Linux issue with these Intel AX cards. Kaldek is running Windows. It is more likely their Intel device firmware (firmware is proprietary binary regardless of OS/platform AFAIK)
Here in Europe (Germany) I've had an issue with all AX2XX and BE200 Cards ive owned with any channel above 124.The Intel AX adapters are hot garbage. Last week I literally had a laptop with an AX200 card sitting next to one with a Realtek 8852CE. The Realtek was connecting to my cAP ax in the next room at 1201 Mbs (max rate) and the AX200 was connecting at 300 Mbs. It also randomly decides to connect to 2.4Ghz on a far AP even when sitting underneath a cAP ax, which requires turning WiFi off and back on to fix. This is in Windows 11 23H2 connecting to cAP ax units running ROS 7.15.1.It looks like the problem is only for Intel AX devices like (ax200/ax210 wifi cards)
I do not blame Mikrotik for this, as I have a ton of Wireless devices (Apple, Android, Google Home, Nvidia Shield, Chromecasts, etc) and it's ONLY the devices with AX cards that act weird.
It may be related to incorrect country/region specific information.Here in Europe (Germany) I've had an issue with all AX2XX and BE200 Cards ive owned with any channel above 124.
Above Channel 124 they only want to connect to 5GHz wifi if the channel width is a max of 20mhz.
Otherwise they connect with limited wifi speed.
Not really true, since we apply lots and lots of patcheshum, of using dated and EOL 5.5.x kernel.
I was talking about linux kernel in the mikrotik AP, not the client side.
risky strategy.. https://youtu.be/_yWhsynnxEg?t=1211Not really true, since we apply lots and lots of patches
Not using a LTS kernel can't be "patched" by applying "lots of" patches. Your kernel branch is not supported anymore. It is kernel version 5.6.3 apparently. You may backport and apply all the patches you think you need selectively - nice try. This is wasting at least one developer manpower only for patch management for your diverging kernel. It gets harder every month/year. Randomly introducing bugs by this process included. In my opinion this is insane.Not really true, since we apply lots and lots of patches
hum, 🪃 of using dated and EOL 5.5.x kernel.😶
Though with no real issues recognized. Are you interested in a support output file?lte;error lte mbim: >>> E #15 - connect: connect, error: FAILURE
On the other hand, when issues related to this release are brought up, there is no reply from MikroTik employees, and when generic issues are discussed, they jump right in.As stated in the post and many other posts in other version topics - please keep this topic related to its purpose. This is v7.16 topic. For other discussions, open up a new topic. The bigger half of posts in this article is not in any way related to this release.
On Topic: 7.16 RC1 - Found an annoying bug with the 6to4 tunnel interface. I have VRRP on-backup and on-master scripts that disable or enable various interfaces to enable HA between to Mikrotiks.
One of the Mikrotiks ended up in a boot-loop. The root cause was a process failure when the 6to4 tunnel was re-enabled. (When booting the on-backup script is always run because VRRP starts in backup state, this disabled the 6to4 tunnel. A couple of seconds later the device transitioned to master, and the on-master script ran and enabled the 6to4 tunnel - this caused a router reboot, and then a boot-loop).
Reproduced on a blank netinstalled Mikrotik, and also a x64 CHR. Could also force the issue by disabling and re-enabling the 6to4 tunnel via Winbox a couple of times.
Created SUP-161728 to report the issue, included autosup.rif and also a script to re-create the problem. Hopefully this can be fixed before release.
LTS kernel support was recently shortened to just 2 years and one of the reasons stated was that almost nobody is using them and that is according to Greg Kroah-Hartman. He is right of course when saying that the most recent mainline stable kernel is the most secure one but companies like Red Hat that also must care about stability of their product are using custom kernels with backported patches. RHEL 8 for example is using 4.18 and RHEL 9 is using 5.14, neither of which is supported by Linux mainline kernel developers for many years... Since Mikrotik must support their products for many years while keeping compatibility they cannot afford to change kernel to currently stable version every couple of months so Red Hat approach seems better fit for them...Not using a LTS kernel can't be "patched" by applying "lots of" patches. Your kernel branch is not supported anymore. It is kernel version 5.6.3 apparently. You may backport and apply all the patches you think you need selectively - nice try. This is wasting at least one developer manpower only for patch management for your diverging kernel. It gets harder every month/year. Randomly introducing bugs by this process included. In my opinion this is insane.
Not really true, since we apply lots and lots of patches
Maybe Mikrotik is already patching kernel heavily with their own code. Or maybe it is the size they could squeeze out of 5.6.x and after realizing that LOC are increasing heavily with every kernel release thus increasing size is a big issue when one deploys on 16MB platforms.
And of course you reach a point where you can't "cherry pick" commits anymore. It won't get better by finding workarounds for the AX compatibility thingy. My experience is: when someone talks about "finding a workaround" this is a clear sign of tech debt.
I mean: forget what I said. I am some random dude. Listen to Greg Kroah-Hartman.
As a software engineer, I have to agree that latest and greatest is not always the best specially if you have to maintain such a complex codebase like RouterOS with custom network stack, device drivers and so on. It's not uncommon for networking vendors to build from whatever Linux/OpenWrt based SDK the chipset maker gives them but Mikrotik is building their own thing from scratch and I admire that.LTS kernel support was recently shortened to just 2 years and one of the reasons stated was that almost nobody is using them and that is according to Greg Kroah-Hartman. He is right of course when saying that the most recent mainline stable kernel is the most secure one but companies like Red Hat that also must care about stability of their product are using custom kernels with backported patches. RHEL 8 for example is using 4.18 and RHEL 9 is using 5.14, neither of which is supported by Linux mainline kernel developers for many years... Since Mikrotik must support their products for many years while keeping compatibility they cannot afford to change kernel to currently stable version every couple of months so Red Hat approach seems better fit for them...
Me too, also with the default !reselect-interval channels are not scanned?I have some questions on how the reselect-interval works?
I would answer with yes. It would be a breaking change otherwise.Me too, also with the default !reselect-interval channels are not scanned?
This should be a good thing, right?So it is no background operation. May be different on AX hardware though.
Reasons to move to new kernel versions include:
- major new features that have been added and are not so easily "backported" to the kernel you use
- drivers from manufacturers, maybe in binary form, that are not compatible with an older kernel
- not wanting to track each and every patch to see if not having it could impact security of your device
But of course, we have seen that the decision to move to a new kernel version is so hard for companies like MikroTik that it can take many many years before they finally take the plunge.
(and all that time it becomes harder and harder, because there will be more changes that impact functionality of your device)
See how long it took before the promised land of v7 was finally delivered.
(and still today people are complaining because routing performance is less than it was with v6)
I tried it as well on my AX3 and the clients get disconnected, that’s not good.I would answer with yes. It would be a breaking change otherwise.Me too, also with the default !reselect-interval channels are not scanned?
I tried with a low interval and the interface status changes to scanning, dfs and whatever. So it is no background operation. May be different on AX hardware though.
re read my post please. I said the interface status changes from running to scanning (don't remember the exact term). So of course the clients are disconnected.This should be a good thing, right?So it is no background operation. May be different on AX hardware though.
Please confirm that clients are not disconnected during scan.
Should be nice to have some indications about recommended values too.
But I don't know what it actually does. Mikrotik did not explain it thoroughly. Classic Mikrotik changelog item that leaves you behind with a lot of questions. Mikrotik - please explain this functionality. Thank you.*) wifi - send channel switch announcements to clients when switching channels at requested re-select intervals;
Compare the price of these products and you know!Why is so hard for MT to make a real stable system, compare with cisco, juniper, even vyos, its suppose have a very powerfull cpu right? With so many rams and cpus
Kernel panic
Unstable bgp
Suddenly Reboot by watchdog.
Thx
/ip dns static
add name=adservice.google.com type=NXDOMAIN
add forward-to=dns.google regexp="(\\.|^)google\\.com\$" type=FWD
I have the same problem. I hope the developers will seriously consider it.With the recent update:
*) DNS - match NXDOMAIN static entry only if other type entries for the same name are not found;
I’ve noticed an issue that affects the way DNS rules are applied. Previously, it was possible to block adservice.google.com by returning NXDOMAIN and then forward the rest of google.com domains to an external DNS server. However, with this update, it’s no longer possible to block ads this way because the second forwarding rule matches *.google.com, which overrides the first NXDOMAIN entry for adservice.google.com.
I believe this change introduces a problem. In most DNS-related configurations, like in dnsmasq, the rules are executed sequentially from top to bottom. I’m not sure why this change was made, but I strongly suggest that the rules should be executed based on their order, rather than checking for matches with other rules.
Here is the example code I’m referring to:Code: Select all/ip dns static add name=adservice.google.com type=NXDOMAIN add forward-to=dns.google regexp="(\\.|^)google\\.com\$" type=FWD
The priority should normally be determined by the order of the rules, not by their type. In other words, rules that appear earlier should have higher priority than those that come later.I always had the issue that a regex FWD record had priority over a static A record. Was that different with NXDOMAIN?
The order of RRs in a set is not significant, and need not be preserved by name servers, resolvers, or other parts of the DNS.
I wonder why you use a regexp match instead of an explicit match for google.com + the setting "match subdomain"?Here is the example code I’m referring to:Code: Select all/ip dns static add name=adservice.google.com type=NXDOMAIN add forward-to=dns.google regexp="(\\.|^)google\\.com\$" type=FWD
That is something different! It specifies the order of the data in the DNS reply, not the order of processing of rules.Well, your opinion. RFC 1034 is what is relevant.
https://www.ietf.org/rfc/rfc1034.txt
The order of RRs in a set is not significant, and need not be preserved by name servers, resolvers, or other parts of the DNS.
It's difficult for me to evaluate the match subdomain feature, but I can say that it's not very useful. Besides the fact that it randomly fails to match, which many people around me have experienced (including myself, leading me to abandon it), there's also the reason why I’m using regex. Below is the complete code, which match subdomain simply can't achieve:I wonder why you use a regexp match instead of an explicit match for google.com + the setting "match subdomain"?Here is the example code I’m referring to:Code: Select all/ip dns static add name=adservice.google.com type=NXDOMAIN add forward-to=dns.google regexp="(\\.|^)google\\.com\$" type=FWD
I would think that is much more efficient...
Maybe because that capability was introduced later?
/ip dns static
add forward-to=dns.google regexp="(\\.|^)google\\.[a-z][a-z]([a-z]|)(\\.[a-z][a-z]|)\$" type=FWD
This "fix" was probably introduced for the case where you have two explicit static entries, one for "machine.example.com IN A 1.2.3.4" and another for "machine.example.com NXDOMAIN" and then the client asks for MX or AAAA or whatever.Indeed, pe1chl.
As I understand it, regardless of order, NXDOMAIN should overrule anything. It says basically the domain does not exist, so it makes no sense to look further. IMHO it is a bug when NXDOMAIN records exist but other records are considered instead. In fact a huge bug.
/ip dns static
add regexp=adservice.google.com type=NXDOMAIN
add forward-to=dns.google regexp="(\\.|^)google\\.com\$" type=FWD
The AP has the ability to tell clients that the AP's channels are going to change at a specific time, then everybody jumps at the exact same time to provide seamless switching. If the hardware has the capability to listen in the background for cleaner channels, then it will do a background scan, then (ideally) jump to a cleaner/clearer section of the band.But with the change of 7.16 something changed and it may not noticable because anymore:
But I don't know what it actually does. Mikrotik did not explain it thoroughly.*) wifi - send channel switch announcements to clients when switching channels at requested re-select intervals;
Is it supported for standalone wifi, capsman or both?*) wifi - send channel switch announcements to clients when switching channels at requested re-select intervals;
If i understand correct, can i use this commands as a kind of whitelist?You do realize you can simply set the regexp to be the explicit name and that will ensure it gets blocked accordingly and the rest is allowed. Like below.
Code: Select all/ip dns static add regexp=adservice.google.com type=NXDOMAIN add forward-to=dns.google regexp="(\\.|^)google\\.com\$" type=FWD
You would expect that to work, but the bug reported here is that it does not work (in this version).If i understand correct, can i use this commands as a kind of whitelist?
-faxxe
Cool that you try to interpret their changelog entry. But I want that confirmed by Mikrotik staff. The changelog leaves too much room for interpretation. It could be either a background scan and then the AP promotes the new channel right away. But it could also be that the AP just promotes the 2ghz BSSID when the 5ghz BSSID goes down for scanning (or vice versa)...The AP has the ability to tell clients that the AP's channels are going to change at a specific time, then everybody jumps at the exact same time to provide seamless switching. If the hardware has the capability to listen in the background for cleaner channels, then it will do a background scan, then (ideally) jump to a cleaner/clearer section of the band.But with the change of 7.16 something changed and it may not noticable because anymore:
But I don't know what it actually does. Mikrotik did not explain it thoroughly.
OK thanks... what do you think, will they get it right at some point...7.2x?You would expect that to work, but the bug reported here is that it does not work (in this version).
That's not how the standard works, which is years-old, by the way. (Google for Channel Switch Announcement, 802.11h.)But it could also be that the AP just promotes the 2ghz BSSID when the 5ghz BSSID goes down for scanning (or vice versa)...
/routing rule
add action=lookup disabled=no routing-mark=main table=main
Oh, did not know that there is a separate standard. Thanks for the information. Googled it, should be defined in IEEE 802.11-2012.That's not how the standard works, which is years-old, by the way. (Google for Channel Switch Announcement, 802.11h.)But it could also be that the AP just promotes the 2ghz BSSID when the 5ghz BSSID goes down for scanning (or vice versa)...
I bet here is nothing for igmp issue found on SUP-152693?What's new in 7.16rc2 (2024-Aug-13 10:05):
*) bridge - added L2 MDB support for IGMP snooping (additional fixes);
Support told me that this happened because the VM memory was still at 256MB (which was the result of deploying the VM with the 7.15.1 .ova file and then upgrading it to 7.16beta1).Upgraded a CHR from 7.16beta1 to 7.16rc1
Result: lost extra routing table and all BGP configuration (maybe second is caused by first).
what is the intention or goal?
After 7.16.X showWhat's new in 7.16rc2 (2024-Aug-13 10:05):
*) 6to4 - improved system stability when initializing 6to4 interface (introduced in v7.16rc1);
*) bridge - added L2 MDB support for IGMP snooping (additional fixes);
*) console - improved large import file handling, error detection and stability (additional fixes);
*) dhcp - improved DHCP IPv4 and IPv6 client/relay/server underlying interface state change handling (additional fixes);
*) ovpn - improved system stability (additional fixes);
*) route - improved routing table update performance;
*) x86/chr - fixed invalid HDD size (introduced in v7.16beta7);
thanks so much appreciate it cheersYou can download basically any RouterOS version from our upgrade server. Just change the version number in URL.
https://upgrade.mikrotik.com/routeros/7 ... -arm64.npk
I have experienced cases where the configuration changes where not pushed "automatically" and it needed to call "provision" again. I can't remember the concrete cases though.If you adjust any configuration profile that is linked to provisioned interface, all changes will be "pushed" as soon as you apply changes to the profile.
Me too, also with the default !reselect-interval channels are not scanned?I have some questions on how the reselect-interval works?
Stil no answer to SUP-155649 asking to provide extra info about this feature.
You can read some advices: search.php?keywords=reselect-interval&t ... sf=msgonly
@Guntis Thank you for your clear information on my misunderstand of the reprovisioning. Is it okay if I quote/cite your post here into the forum thread I linked to help others stumbling over the same topic? I would then update my configuration and do not do reprovisioning after each config change and check if the issue happens again. Your answer helped a lot. Thank you!Regarding the forum post you linked, a lot of issues can be caused by re-provision, there seems to be a misconception that configuration needs to be "pushed" via help of provisioning the interfaces.
/interface/wifi/capsman/remote-cap/provision
/interface/wifi/radio/provision
Have this issue all the time. config updates are not applied by the caps unless reprovisioned, it's highly unreliable.I have experienced cases where the configuration changes where not pushed "automatically" and it needed to call "provision" again. I can't remember the concrete cases though.
Exactly, I don't understand why people are doing it. My guess is, for example, on Unify after config change you see status change to "provisioning" (or smth similar), maybe that's the urge, or I cant rule out some edge case where there can be issue. Otherwise for me all 50+ APs I manage receive changes instantly after config change in capsman.robtor "wifi - added "slave-name-format";" just adds more control on how virtual interfaces can be named.
Regarding the forum post you linked, a lot of issues can be caused by re-provision, there seems to be a misconception that configuration needs to be "pushed" via help of provisioning the interfaces. That is not the case, provision must be done only initially, and is done automatically upon CAP joining if there are matching provisioning rules that are enabled.
If you adjust any configuration profile that is linked to provisioned interface, all changes will be "pushed" as soon as you apply changes to the profile. With no need to re-create already existing interface.
Provisioning itself is not for sending configuration, it is for essentially creating a new interface. In most cases, there is no reason to perform manual provisioning once you already have CAP interfaces running.
Now, if create-enabled together with slave-static is used and interface names change after reboot or upgrade, please let us know via support@mikrotik.com along with details of what was upgraded/rebooted. In our tests, names should not change, and references to CAP interfaces should work, and not be lost, both on CAPsMAN and CAP side, even if virtual CAP interface is renamed on CAP.
It literally says "channel switch announcement" in the changelog entry.Oh, did not know that there is a separate standard. Thanks for the information. Googled it, should be defined in IEEE 802.11-2012.That's not how the standard works, which is years-old, by the way. (Google for Channel Switch Announcement, 802.11h.)
But when it really is this, I would want Mikrotik to include this reference in their changelog as well. So we all can lookup/google for more info.
In other words, "We're using this standard (802.11h) to ensure smooth[er] channel changes at the reselect interval, instead of just dumping clients when making a change."*) wifi - send channel switch announcements to clients when switching channels at requested re-select intervals;
Is it true you don't have to explicitly configure reselect-interval? Having a reselect interval seems to be required in 7.15.If there are examples where configuration was changed, but CAPs didn't reflect the change, please let us know, and share with us supout.rif files from both CAP and CAPsMAN, they should be made before manual "provision" is performed to fix the issue.
If you don't explicitly configure reselect-interval no automatic rescan will take place, unless interface goes down - CAP-CAPsMAN communication was interrupted, restart, etc. Reselect-interval uses a background scan.
Thanks @Guntis, so to make more clearer;If you don't explicitly configure reselect-interval no automatic rescan will take place, unless interface goes down - CAP-CAPsMAN communication was interrupted, restart, etc. Reselect-interval uses a background scan.
In my case (RB5009 + 3 HaP ax2) it does disconnects clients. At least the ones which are on 5Ghz with DFS-10min skip.In other words, "We're using this standard (802.11h) to ensure smooth[er] channel changes at the reselect interval, instead of just dumping clients when making a change."
Thank you, this was my concern as well. But what does this mean int he chage log?It does not seem to be possible to do seamless channel change as there is no dedicated radio to do the scan while current channel is active
AP announces that it is switching to a new channel before it begins transmitting on that channel.wifi - send channel switch announcements to clients when switching channels at requested re-select intervals;
To whoom does this announsment will be sent? To the ones who just been disconnected? :-)
I really doubt background. According to what I see in my setup it just kicks off everyove and starts scanning. Which is not exactly in background......
- using reselect-interval=6h..8h CAPsMAN or standalone AP will scan randomly for alternative channels in the background between 6h - 8h from device restart...
So I belive this documentation is not really correct. Would love to see MT comments on this.reselect-interval (time interval)
Specifies when the interface should rescan channel availability and select the most appropriate one to use. Specifying interval will allow the system to select this interval dynamically and randomly. This helps to avoid a situation when many APs at the same time scan network, select the same channel, and prefer to use it at the same time. reselect-interval uses a background scan.
a) Will clients be disconected begore background scan in order to perform it? Or it will be done using the same radio keeping clients connected during the scan happening?It will still perform background scan for better channel, even if you set just a few frequencies, though we would recommend using "auto" frequency - not setting a frequency on the interface.
It is also what I saw in lab. Just tried it again: used a interval of "1m..2m" so I can see it happen. The interface goes into reselect and scan for channel and even when the frequency stays the same - the clients are all kicked off. I can proof that by log entries that all clients that were connected to the particular interface either connect again once the interface is running state again (so there must have been a disconnect otherwise why should the connect???) or the switch to another CAP in the same second of reselect is running. Coincidence? I guess not. So the "background" scan is clearly not a background one. It is clearly interrupting.I really doubt background. According to what I see in my setup it just kicks off everyove and starts scanning. Which is not exactly in background...
So the "background" scan is clearly not a background one. It is clearly interrupting.
Perfect extended version of changelog you have been asked for for years. Keep going ... please, please, pleaseThere was a possibility that OVPN router can get a "Kernel Failure". If you have a router running OVPN and it sometimes reboots due to a Kernel Failure, then upgrade and see how it goes....
This is the first time I've seen an “official” recommendation to use automatic frequency selection instead of manually setting values.though we would recommend using "auto" frequency - not setting a frequency on the interface.
As one-liner, two short sentences long ... please do not kill my motivation speachI would not call it "perfect" as it is still very vague. "there was a possibility" and "sometimes" does not tell one anything more than the original "improved stability" entry.
IMHO, you should limit the minimum value for the “reselect interval” parameter anyway. For example, set it to 30 minutes. So that end users cannot “force” the devices to be in the “permanent” selection state.That being said, this discussion should be in a separate topic.
Please keep this forum topic strictly related to this particular RouterOS release.
The "Link Down" counter increases ever time a reselect happens. Why does that happen? It should not go down on a background scan. I mean, as links go down if no clients connected also. But the time between down/up is so minimal I cant believe in coincidence.While in some cases clients might get disconnected, it is not the rule, we have tested it locally on both 2.4GHz and 5GHz, and generally clients stay connected, and it is implemented as background scan. If it causes issues in your environment, use a larger interval or plan out frequencies manually.
In another topic we/I dont have the attention of Mikrotik staff. Separate topics are more like community-only discussions.That being said, this discussion should be in a separate topic.
There was a possibility that OVPN router can get a "Kernel Failure". If you have a router running OVPN and it sometimes reboots due to a Kernel Failure, then upgrade and see how it goes. If the problem remains, then of course contact support@mikrotik.com and send supout file.
Does it work fine on earlier versions? (I'm asking because I'm considering a similar setup in the future. I have three MLAG stacks, just not using them for ESXi yet)FYI: MLAG issue: two CRS317 in MLAG, with ESX hosts dual connected to CRS317 (not LACP, but having ESX decide which switch to send traffic based on the port up status, and the MAC address of the VM). When switch 1 goes down for firmware upgrade, all is ok, ESX starts using switch 2 for all VMs. When switch 1 comes back on line, ESX switches back to using switch 1 for some VMs. But I can't access half of my VMs for around 15 minutes. Then all is well again. Frustrating. I also cannot ping switch 1 for that period of time even though it is up (so it is not just the ESX hosts having an issue with the MLAG mac cache). The CRS317s are connected LACP to a CRS328P floor switch, which I am connected to. Since I can't ping switch 1, I assume the floor switch has learnt that its path is via switch 2, and something is going wrong there.
Are you sure you want to configure MLAG for that? I think in this config you should just plug the two ESXi ethernet ports into two switchports without any special config on the switch...FYI: MLAG issue: two CRS317 in MLAG, with ESX hosts dual connected to CRS317 (not LACP, but having ESX decide which switch to send traffic based on the port up status, and the MAC address of the VM).
Fri Aug 16 18:12:25 2024
NAS-Port-Type = Ethernet
NAS-Port = 2209353285
Service-Type = Framed-User
Calling-Station-Id = "e48d8c2f405b"
User-Name = "E4:8D:8C:2F:40:5B"
Called-Station-Id = "171000351.ipv6"
Delegated-IPv6-Prefix = <removed>
Event-Timestamp = "Aug 16 2024 18:12:25 EET"
Acct-Status-Type = Start
Acct-Session-Id = "450eb083"
Acct-Authentic = RADIUS
Class = 0x3238393838646232383939323239303264623134646436613364383636373965
NAS-Identifier = "IPOE-0"
Acct-Delay-Time = 0
NAS-IP-Address = <removed>
Acct-Unique-Session-Id = "40b7eea37b07aa4543e826c0a80b737b"
Timestamp = 1723824745
Fri Aug 16 18:17:05 2024
NAS-Port-Type = Ethernet
NAS-Port = 2209353318
Service-Type = Framed-User
Calling-Station-Id = "e48d8c2f405b"
User-Name = "E4:8D:8C:2F:40:5B"
Called-Station-Id = "171000351.ipv6"
Delegated-IPv6-Prefix = <removed>
Event-Timestamp = "Aug 16 2024 18:17:05 EET"
Acct-Status-Type = Interim-Update
Acct-Session-Id = "660eb083"
Acct-Authentic = RADIUS
Acct-Session-Time = 60
Class = 0x3238393838646232383939323239303264623134646436613364383636373965
NAS-Identifier = "IPOE-0"
Acct-Delay-Time = 0
NAS-IP-Address = <removed>
Acct-Unique-Session-Id = "33ab053815a4fd07179d97876cb508c3"
Timestamp = 1723825025
Fri Aug 16 18:18:05 2024
NAS-Port-Type = Ethernet
NAS-Port = 2209353318
Service-Type = Framed-User
Calling-Station-Id = "e48d8c2f405b"
User-Name = "E4:8D:8C:2F:40:5B"
Called-Station-Id = "171000351.ipv6"
Delegated-IPv6-Prefix = <removed>
Event-Timestamp = "Aug 16 2024 18:18:05 EET"
Acct-Status-Type = Interim-Update
Acct-Session-Id = "660eb083"
Acct-Authentic = RADIUS
Acct-Session-Time = 120
Class = 0x3238393838646232383939323239303264623134646436613364383636373965
NAS-Identifier = "IPOE-0"
Acct-Delay-Time = 0
NAS-IP-Address = <removed>
Acct-Unique-Session-Id = "33ab053815a4fd07179d97876cb508c3"
Timestamp = 1723825085
Fri Aug 16 18:19:05 2024
NAS-Port-Type = Ethernet
NAS-Port = 2209353318
Service-Type = Framed-User
Calling-Station-Id = "e48d8c2f405b"
User-Name = "E4:8D:8C:2F:40:5B"
Called-Station-Id = "171000351.ipv6"
Delegated-IPv6-Prefix = <removed>
Event-Timestamp = "Aug 16 2024 18:19:05 EET"
Acct-Status-Type = Interim-Update
Acct-Session-Id = "660eb083"
Acct-Authentic = RADIUS
Acct-Session-Time = 30
Class = 0x3238393838646232383939323239303264623134646436613364383636373965
NAS-Identifier = "IPOE-0"
Acct-Delay-Time = 0
NAS-IP-Address = <removed>
Acct-Unique-Session-Id = "33ab053815a4fd07179d97876cb508c3"
Timestamp = 1723825145
Fri Aug 16 18:20:06 2024
NAS-Port-Type = Ethernet
NAS-Port = 2209353318
Service-Type = Framed-User
Calling-Station-Id = "e48d8c2f405b"
User-Name = "E4:8D:8C:2F:40:5B"
Called-Station-Id = "171000351.ipv6"
Delegated-IPv6-Prefix = <removed>
Event-Timestamp = "Aug 16 2024 18:20:05 EET"
Acct-Status-Type = Interim-Update
Acct-Session-Id = "660eb083"
Acct-Authentic = RADIUS
Acct-Session-Time = 90
Class = 0x3238393838646232383939323239303264623134646436613364383636373965
NAS-Identifier = "IPOE-0"
Acct-Delay-Time = 0
NAS-IP-Address = <removed>
Acct-Unique-Session-Id = "33ab053815a4fd07179d97876cb508c3"
Timestamp = 1723825206
On my hEX s the smb with encryption is stable on 7.16rc2, but this device uses an other CPU architecture.IP / SMB / SHARES - required-encryption option leads to hard restart of RB5009 in 7.15, is it already solved in this version, or in what version is planned? Some info?
It is, its confirmed bug, which will be corrected in some new RouteOS version. Im just interested, if in this version problem still exists, or is solved.On my hEX s the smb with encryption is stable on 7.16rc2, but this device uses an other CPU architecture.IP / SMB / SHARES - required-encryption option leads to hard restart of RB5009 in 7.15, is it already solved in this version, or in what version is planned? Some info?
I would suggest to create a supout after your device crashed and send it to the support.
vSphere supports LACP on distributed switches which are available with Enterprise license and not as a separate extra product...Maybe things would be better when ESXi would support a port aggregation protocol like LACP, but it does not.
(at least not without buying extra products)
I was replying to the claim that ESX lacks LACP which may be the issue, on the other hand I do have environment with ESX hosts without LACP support configured and switches that do use LACP (some not Mikrotik ) with 2 CRS309-1G-8S+ in MLAG setup and I don't experience any communication issues when restarting either one of them...The issue is not ESX, that is just a highly visible victim. The floor switch (crs328p) connected lacp across both crs317 (mlag) switches also loses the ability to ping one of the mlag switches for a period of time 5-15mins after the switch has actually finished rebooting. It seems like the mlag is not flushing its L2 mac cache when it's peer comes back online, or the two are not synchronising their L2 mac caches. That is a guess. Let's leave ESX out of it. MT issue.
Standalone AP or using CAPsMan?I ran into a problem with VLAN's: wireless clients got MGT VLAN addresses assigned as well as HOME VLAN addresses. Found out from looking at the DHCP leases and IP ARP entries. After downgrading to 7.15.3 the problem was solved.
Yes, all information was supplied to support.
CAPsMAN:Standalone AP or using CAPsMan?
Peer link seems to be the same as mine, just to mention that it shouldn't have MLAG ID assigned...@bratislav, interesting, as I have the same setup, including 2 DACs in 802.3ad bond for peer link with dedicated PVID. All VLANs are tagged on the LAG-PeerLink except for the PVID (3999) for the peerlink, which is untagged on LAG-PeerLink. Multiple LACP bonds to fabric switches (some MT some not), and yet I have this problem. Bridge shows both MLAG peers are connected one as primary, the other as secondary with the same system id.
LACP bonds from remote switches across the fabric are working, with traffic on both ports. Rapid spanning tree setup is identical on both MAG switches with their priority set to ensure they are the root (2000 hex).
Only thing I can think of is how are you managing the switches (assigning the L3 IP address). In my case I create a VLAN interface under my br-trunk and add an IP address to that VLAN interface. Bridge is set with vlan filtering enabled, each interface ethernet switch port is set to l3-hw-offloading=no. Any differences (I am using CRS-317 vs you using CRS-309 but that should not be an issue).
interface/ethernet/switch/set switch1 l3-hw-offloading=no
These cards cost around 20eur. Make the investment and do investigation. It is easier and more efficient than analyzing supout files from customers. 😉We have found that there is a linux kernel driver issue with intel ax201/ax210 cards, that exists in all linux based operating systems, we are trying to find ways to handle these clients better, so if you can reliably repeat these issues and you have this card, send us a supout.rif file to support
It's also one of the most prolific cards in use today, generally considered one of the best options available, at least in the US.These cards cost around 20eur. Make the investment and do investigation. It is easier and more efficient than analyzing supout files from customers. 😉We have found that there is a linux kernel driver issue with intel ax201/ax210 cards, that exists in all linux based operating systems, we are trying to find ways to handle these clients better, so if you can reliably repeat these issues and you have this card, send us a supout.rif file to support
Only taking that part to respond to: ESXi works perfectly fine without bonding (LACP, etc). It has an internal mechanism in it's switching technology (mainly distributed switches) that handles this for you. At work we have a lot of these hosts connected to regular downlink ports (more or less) and ar enot using LACP. If you face issues it is (in a normal configuration) not ESXi's problem but upstream (switches etc.).The issue is not ESX, that is just a highly visible victim.
Not really useful when there is no info at all about what e.g. the previous installed version was, and what kind of configuration it is running.7.16r2 'stops routing' on an RB4011. can't ping out from the device, but winbox works.
unfortunately this was in production and I couldn't pull support files.
Well, i am well aware of the issue. I have search the whole mikrotik forum and it came up with several fellas have the same issue.ppptran, this is your first comment in this topic. Have you reported your issue somewhere else? Mikrotik are no mind-reader.
*) dns - revert "match NXDOMAIN static entry only if other type entries for the same name are not found" (introduced in v7.16beta7);
Have you give a shot for another wireless packages (i.e. wifi-qcom-ac or wave2)?Well, i am well aware of the issue. I have search the whole mikrotik forum and it came up with several fellas have the same issue.ppptran, this is your first comment in this topic. Have you reported your issue somewhere else? Mikrotik are no mind-reader.
Basically:
Im using X86 and Compex WLE900VX 7AA ( QCA9880 , mpcie)
The only version of Router OS that make this wifi card fully functional is Ver 7.5 and 7.6 . That's it.
All other version up until now ver 7.16RC, it seem that the card Radio is Disable.
Bellow picture is current Ver 7.16RC
Can someone from Mikrotik explain why?
*) dns - revert "match NXDOMAIN static entry only if other type entries for the same name are not found" (introduced in v7.16beta7);
I did. I have a different condition and this revertion is affecting me. Example:you can look up the discussion on the negative impact of the original change here in the topic.
In fact negative DNS entries are cached. I got the whole TLD (.private) cached as NXDOMAIN and the FWD didn't work. I tested the whole thing and provided evidence to Mikrotik support.I think I understand your use-case - but not your explanations why it would break. Mikrotik DNS client does - AFAIK - not cache NXDOMAIN. So no matter how often you try to resolve "<random>.private": once you make an NS resolve request to "subdomain1.private" the FWD entry will be used. Isn't this working for you?
Didn't install 7.16 yet, and i do not see this behavior in 7.15.3... But: you can add static entry to the whole TLD:In fact negative DNS entries are cached. I got the whole TLD (.private) cached as NXDOMAIN and the FWD didn't work. I tested the whole thing and provided evidence to Mikrotik support.
/ip/dns/static/add name=private. forward-to=10.10.10.1 match-subdomain=yes
I did and I also tried adding a static entry for "private" pointing to 127.0.0.1 (to avoid getting a NXDOMAIN from the upstream public DNS) and then using a FWD for subdomain1.private to 10.10.10.1. Those count as workarounds, not real fixes.Didn't install 7.16 yet, and i do not see this behavior in 7.15.3... But: you can add static entry to the whole TLD:In fact negative DNS entries are cached. I got the whole TLD (.private) cached as NXDOMAIN and the FWD didn't work. I tested the whole thing and provided evidence to Mikrotik support.
Have you tried this?Code: Select all/ip/dns/static/add name=private. forward-to=10.10.10.1 match-subdomain=yes
Is this a 7.16 behaviour? I can't see any NXDOMAIN cache entry added on 7.15.3 when doing e.g.In fact negative DNS entries are cached. I got the whole TLD (.private) cached as NXDOMAIN and the FWD didn't work. I tested the whole thing and provided evidence to Mikrotik support.
:put [:resolve foo.private]
Using 7.15.2 BTWIs this a 7.16 behaviour? I can't see any NXDOMAIN cache entry added on 7.15.3 when doing e.g.In fact negative DNS entries are cached. I got the whole TLD (.private) cached as NXDOMAIN and the FWD didn't work. I tested the whole thing and provided evidence to Mikrotik support.
Code: Select all:put [:resolve foo.private]
It is a negative answer, Mikrotik puts 0.0.0.0 in their cache. This breaks basically any FWD entry for this parent zone as I mentioned before.Then your upstream DNS server responds with 0.0.0.0. Basically it.
/ip/dns/cache/all/print
/ip/dns/cache/print
Here is the real example, the whole TLD "private" gets negative cached. No FWD entry for submain will work.It is a negative answer, Mikrotik puts 0.0.0.0 in their cache. This breaks basically any FWD entry for this parent zone as I mentioned before.Then your upstream DNS server responds with 0.0.0.0. Basically it.
I can confirm. Negative answers are cached and live until you flush the cache. This is really a PITA and I am wondering if this is "mikrotik way" or if this is conforming a RFC.It is a negative answer, Mikrotik puts 0.0.0.0 in their cache. This breaks basically any FWD entry for this parent zone as I mentioned before.Then your upstream DNS server responds with 0.0.0.0. Basically it.
Thank you!I can confirm. Negative answers are cached and live until you flush the cache. This is really a PITA and I am wondering if this is "mikrotik way" or if this is conforming a RFC.
It is a negative answer, Mikrotik puts 0.0.0.0 in their cache. This breaks basically any FWD entry for this parent zone as I mentioned before.
But still: not strictly related to 7.16.
Of course, caching negative answers is "normal", but in most normal resolvers you can separately set the cache time for negative answers (so you can set it very low or 0).I can confirm. Negative answers are cached and live until you flush the cache.
It is, create a static NXDOMAIN record and look in the cache...I don't know if NXDOMAIN (non-existing domain) is the same as "NEGATIVE" from Mikrotik DNS cache.
Having the option to set NXDOMAIN ttl to 0 (or very low) is also not really a smart choice, it would help in the specific use case you need it but would also affect the other 99,9% of the NXDOMAIN queries you would normally cache (to avoid hammering upstream DNS servers with NXDOMAIN queries).Of course, caching negative answers is "normal", but in most normal resolvers you can separately set the cache time for negative answers (so you can set it very low or 0).I can confirm. Negative answers are cached and live until you flush the cache.
Still, this complicated handling of priority of different kinds of records has never been under control in RouterOS. Every time the programmer touches the code, there is another "unexpected" problem. Probably it is hard to find a programmer with experience in DNS, and we will have to live with this until finally we get the option of using a well-established resolver (if only as an optional package).
Fingers crossed for 7.17You can set your hopes in 7.17. Introducing "new" fixes in a RC is not the goal of a RC release.
15:45:37 system,error,critical router was rebooted without proper shutdown, probably kernel failure
15:45:37 system,error,critical router was rebooted without proper shutdown, probably kernel failure
15:45:38 system,error,critical kernel failure in previous boot
15:45:38 system,error,critical kernel failure in previous boot
15:45:38 interface,info lo link up
15:45:40 disk,info add usb1-part1 size:31.5G fs:fat32
15:45:47 bridge,info hardware offloading activated on bridge "bridge" ports: ether1,ether3,ether2,ether5,ether4
15:45:50 interface,info ether3 link up (speed 1G, full duplex)
15:46:19 system,info,account user admin logged in from 34:29:8F:99:9F:62 via winbox
15:46:20 script,warning DefConf gen: Unable to find wifi radio data
15:46:20 system,error,critical error while running customized default configuration script: interrupted
15:46:20 system,error,critical error while running customized default configuration script: interrupted
15:46:20 system,error,critical
15:46:20 system,error,critical
*) lte - improved modem AT/modem port open;
No wireless package installed, but kernel failure after 30 mins uptime (ros x86):I have a single hap ac2 running with a wireless uplink on the 5ghz interface.
When upgrading from 7.15.3 to 7.16rc3 it ran into a kernel panic and then when it finally booted i was missing the wifi2 interface.
Then again upgrading from 7.16rc3 to rc4 also caused the same issue.
I had to netboot to restore the wifi2 interface.
Code: Select all15:45:37 system,error,critical router was rebooted without proper shutdown, probably kernel failure 15:45:37 system,error,critical router was rebooted without proper shutdown, probably kernel failure 15:45:38 system,error,critical kernel failure in previous boot 15:45:38 system,error,critical kernel failure in previous boot 15:45:38 interface,info lo link up 15:45:40 disk,info add usb1-part1 size:31.5G fs:fat32 15:45:47 bridge,info hardware offloading activated on bridge "bridge" ports: ether1,ether3,ether2,ether5,ether4 15:45:50 interface,info ether3 link up (speed 1G, full duplex) 15:46:19 system,info,account user admin logged in from 34:29:8F:99:9F:62 via winbox 15:46:20 script,warning DefConf gen: Unable to find wifi radio data 15:46:20 system,error,critical error while running customized default configuration script: interrupted 15:46:20 system,error,critical error while running customized default configuration script: interrupted 15:46:20 system,error,critical 15:46:20 system,error,critical
EDIT: Did anyone else run into a similar issue?
Aug 31 13:03:31 xxx system,info installed system-7.16rc4
Aug 31 13:03:31 xxx system,info installed dude-7.16rc4
Aug 31 13:03:31 xxx system,info router rebooted by windows-nt-10.0-win64-x64/web:admin@192.168.0.57/upgrade
Aug 31 13:03:31 xxx interface,info lo link up
Aug 31 13:03:31 xxx interface,info ether8-link up (speed 100M, full duplex)
Aug 31 13:03:31 xxx interface,info ether7-link up (speed 1G, full duplex)
Aug 31 13:03:32 xxx interface,info ether6-link up (speed 1G, full duplex)
Aug 31 13:04:15 xxx system,info,account user admin logged in from 192.168.0.57 via web
Aug 31 13:05:06 xxx system,info,account user admin logged in from 192.168.0.200 via winbox
Aug 31 13:32:43 xxx system,info router rebooted
Aug 31 13:32:43 xxx interface,info lo link up
Aug 31 13:32:43 xxx system,error,critical kernel failure in previous boot
Brother, I'm talking about the X86. There's no such wifi-qcom-ac / wave2 package in the X86 Extra PackageHave you give a shot for another wireless packages (i.e. wifi-qcom-ac or wave2)?
Well, i am well aware of the issue. I have search the whole mikrotik forum and it came up with several fellas have the same issue.
Basically:
Im using X86 and Compex WLE900VX 7AA ( QCA9880 , mpcie)
The only version of Router OS that make this wifi card fully functional is Ver 7.5 and 7.6 . That's it.
All other version up until now ver 7.16RC, it seem that the card Radio is Disable.
Bellow picture is current Ver 7.16RC
Esxi does not work perfectly fine without lacp. Their internal means of loop prevention breaks the second you start using promiscuous mode. It is a horrible idea and breaks networking. In particularly, when bridging between 2 layer2 networks (in my case it was a layer2 traffic inspection device).Only taking that part to respond to: ESXi works perfectly fine without bonding (LACP, etc). It has an internal mechanism in it's switching technology (mainly distributed switches) that handles this for you. At work we have a lot of these hosts connected to regular downlink ports (more or less) and ar enot using LACP. If you face issues it is (in a normal configuration) not ESXi's problem but upstream (switches etc.).The issue is not ESX, that is just a highly visible victim.
Edited: got complaints to make it a more normal post.. here it is Master Onno ;-)
Sent!itimo01, bjoerns - Can you please send supout.rif files from your router to support@mikrotik.com?
thanks ... didn't know about that or at least never realized this was thereTIL: there is
which shows you really all cache entries.Code: Select all/ip/dns/cache/all/print
Unlike
Code: Select all/ip/dns/cache/print
I have no supouts available and also cannot reproduce the issue. But logging showed that the router already rebooted several times after few hours of uptime before the upgrade, i didn't see it because it recovered that fast... I've downgraded to stable channel and then back to testing, disabled multi cpu and removed a stale interface from a disabled ipv6 nd config. Will do tests on a different box.itimo01, bjoerns - Can you please send supout.rif files from your router to support@mikrotik.com?
One of the reasons why I configured about a month ago a daily auto-reboot on 1 cap AC using wifi-qcom-ac driver... otherwise it kept crashing somewhere in the 2nd or 3th day of uptime. Now I control at least when it reboots, when nobody is connected. It's behaving nicely since then.There was a possibility that if you use an ARM router with wireless that has 128 MB of RAM and is using wifi-qcom-ac package, not wireless, then simply router could run out of RAM resources causing the router to reboot.
No such reboot issues here. cAP ac is managed by capsman. And strods said this RAM consumption is 7.16 related and only "maybe before". Maybe 7.15 is not affected?One of the reasons why I configured about a month ago a daily auto-reboot on 1 cap AC using wifi-qcom-ac driver... otherwise it kept crashing somewhere in the 2nd or 3th day of uptime. Now I control at least when it reboots, when nobody is connected. It's behaving nicely since then.
Literally nothing running on that device except for it acting as AP.
uptime: 1w4h10m2s
version: 7.15.3 (stable)
build-time: 2024-07-24 10:39:01
factory-software: 6.44.6
free-memory: 31.6MiB
total-memory: 128.0MiB
cpu: ARM
cpu-count: 4
cpu-frequency: 448MHz
cpu-load: 1%
free-hdd-space: 780.0KiB
total-hdd-space: 15.2MiB
write-sect-since-reboot: 1600
write-sect-total: 27259
architecture-name: arm
board-name: cAP ac
platform: MikroTik
And added a scheduled reboot instead of downgrading to 7.15 again :PI'm not 100% sure anymore why I moved on to 7.16b/rc channel for that one device.
hAP ac2 has 3 more ethernet ports, so more buffer on switch chip is in use ... perhaps that's a life saver? LOL... on 1 cap AC using wifi-qcom-ac driver...
Oddly enough, no such issue on ac2 with exact same config (same RAM amount so why ??).
cAP ac and hAP ac are using exactly the same SOC IPQ-4018 with same integrated switch chip 8327 inside, the only difference being that on cAP only two ethernet lines are in use... I have the same OOM issues and I presume that the reason is number of wifi connections over time and wifi-qcom-ac driver not releasing memory on disconnects...hAP ac2 has 3 more ethernet ports, so more buffer on switch chip is in use ... perhaps that's a life saver? LOL... on 1 cap AC using wifi-qcom-ac driver...
Oddly enough, no such issue on ac2 with exact same config (same RAM amount so why ??).
/log/print detail without-paging
time=19:08:04 topics=system,info message="installed system-7.16rc4"
time=19:08:04 topics=system,info message="installed rose-storage-7.16rc4"
time=19:08:04 topics=system,info message="installed container-7.16rc4"
time=19:08:04 topics=system,info message="installed wireless-7.16rc4"
time=19:08:04 topics=system,info message="installed wifi-qcom-7.16rc4"
time=19:08:04 topics=system,info message="router rebooted"
time=19:08:05 topics=interface,info message="lo link up"
time=19:08:05 topics=interface,info message="VLAN100<->F/W link up"
time=19:08:05 topics=interface,info message="isp<->VLAN666<->f/w link up"
time=19:08:05 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <selecting...> state"
time=19:08:06 topics=bridge,info message=""BR100-dmz" mac address changed to 3A:96:B4:xx:xx:xx"
time=19:08:06 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <stopped> state"
time=19:08:06 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <selecting...> state"
time=19:08:06 topics=interface,info message="VLAN99 link up"
time=19:08:06 topics=bridge,info message=""BR11-lan" mac address changed to 8E:DD:3A:yy:yy:yy"
time=19:08:06 topics=interface,info message="VLAN11 link up"
time=19:08:06 topics=interface,info message="VLAN22 link up"
time=19:08:06 topics=interface,info message="VLAN24 link up"
time=19:08:06 topics=bridge,info message=""BR99-work" mac address changed to 8E:DD:3A:yy:yy:yy"
time=19:08:07 topics=bridge,info message=""BR100-dmz" mac address changed to 8E:DD:3A:yy:yy:yy"
time=19:08:07 topics=bridge,info message=""BR22-wifi-p" mac address changed to 8E:DD:3A:yy:yy:yy"
time=19:08:07 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <stopped> state"
time=19:08:07 topics=bridge,info message=""BR24-wifi-d" mac address changed to 8E:DD:3A:yy:yy:yy"
time=19:08:07 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <selecting...> state"
time=19:08:07 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <stopped> state"
time=19:08:07 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <selecting...> state"
time=19:08:10 topics=interface,info message="1:ISP-Virgin link down"
time=19:08:10 topics=interface,info message="2:LAG3 link down"
time=19:08:10 topics=interface,info message="3:LAG3 link down"
time=19:08:10 topics=interface,info message="4:eth2-vlan11 link down"
time=19:08:10 topics=interface,info message="5:eth1-vlan55 link down"
time=19:08:12 topics=interface,info message="3:LAG3 link up (speed 1G, full duplex)"
time=19:08:12 topics=interface,info message="18:LAG3(P2+P3) link up"
time=19:08:12 topics=bridge,info message=""BR11-lan" mac address changed to 78:9A:18:zz:zz:zz"
time=19:08:12 topics=bridge,info message=""BR99-work" mac address changed to 78:9A:18:zz:zz:zz"
time=19:08:12 topics=bridge,info message=""BR22-wifi-p" mac address changed to 78:9A:18:zz:zz:zz"
time=19:08:12 topics=bridge,info message=""BR24-wifi-d" mac address changed to 78:9A:18:zz:zz:zz"
time=19:08:12 topics=bridge,info message=""BR100-dmz" mac address changed to 78:9A:18:zz:zz:zz"
time=19:08:13 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <stopped> state"
time=19:08:13 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <selecting...> state"
time=19:08:13 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <stopped> state"
time=19:08:13 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <selecting...> state"
time=19:08:13 topics=interface,info message="1:ISP-Virgin link up (speed 1G, full duplex)"
time=19:08:13 topics=interface,info message="2:LAG3 link up (speed 1G, full duplex)"
time=19:08:16 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <requesting...> state"
time=19:08:16 topics=dhcp,info message="dhcp-client on isp<->VLAN666<->f/w got IP address 82.1.1.1"
time=19:08:16 topics=dhcp,debug,state message="dhcp-client on isp<->VLAN666<->f/w entering <bound> state"
time=19:08:20 topics=system,info,account message="user admin logged in from 192.168.31.69 via winbox"
[admin@xxx] /system> reso pr
uptime: 1m6s
version: 7.16rc4 (testing)
build-time: 2024-08-30 06:24:51
free-memory: 274.4MiB
total-memory: 576.0MiB
cpu: ARM64
cpu-count: 2
cpu-load: 1%
free-hdd-space: 67.4MiB
total-hdd-space: 80.7MiB
write-sect-since-reboot: 235
write-sect-total: 235
architecture-name: arm64
board-name: CHR QEMU KVM Virtual Machine
platform: MikroTik
[admin@xxx] /system> check-installation
damaged system package: bad image
If anyone is interested in how reselect.interval is affecting wireless connection, this is what I found in my log connecting two Wi-FI ac AP using v7.16rc4:Yes, if a better channel is found, channel switch announcment will be sent, and clients will be disconnected. 6h..8h means that in random interval after 6, but before 8 hours, reselect.interval will perform a background scan to evaluate if there is a better channel available. ...
What exactly does this do? I'm aware it's possible to send and receive SMS either via AT commands over a (virtual) serial port or via MBIM commands. But isn't that already determined by setting the port in "/tool sms" accordingly (usbX for AT or lteX for MBIM)? In which use cases would you need to use this new setting?*) lte - added "sms-protocol" setting in "/interface lte" menu (CLI only);
Sent by email a few minutes ago, thus no ticket number...itimo01, bjoerns - Can you please send supout.rif files from your router to support@mikrotik.com?
18:05:54 wireless,info 90:09:DF:**:**:E9@wifi1 disconnected, SA Query timeout, signal strength -51
18:05:55 wireless,info 90:09:DF:**:**:E9@wifi2 connected, signal strength -44
18:06:59 wireless,info 90:09:DF:**:**:E9@wifi2 disconnected, SA Query timeout, signal strength -46
18:07:00 wireless,info 90:09:DF:**:**:E9@wifi2 connected, signal strength -50
18:12:06 wireless,info 90:09:DF:**:**:E9@wifi2 disconnected, SA Query timeout, signal strength -47
18:12:20 wireless,info 90:09:DF:**:**:E9@wifi1 connected, signal strength -51
18:14:12 wireless,info 90:09:DF:**:**:E9@wifi1 disconnected, SA Query timeout, signal strength -58
18:14:12 wireless,info 90:09:DF:**:**:E9@wifi2 connected, signal strength -50
18:14:22 wireless,info 90:09:DF:**:**:E9@wifi2 disconnected, SA Query timeout, signal strength -46
18:14:22 wireless,info 90:09:DF:**:**:E9@wifi1 connected, signal strength -50
18:14:36 system,info,account user ***** logged in from 192.168.11.233 via winbox
18:14:36 system,info,account user ***** logged in from 192.168.11.233 via winbox
18:18:28 wireless,info 90:09:DF:**:**:E9@wifi1 disconnected, SA Query timeout, signal strength -56
18:18:34 wireless,info 90:09:DF:**:**:E9@wifi1 connected, signal strength -51
18:28:42 wireless,info 90:09:DF:**:**:E9@wifi1 disconnected, SA Query timeout, signal strength -56
18:28:42 wireless,info 90:09:DF:**:**:E9@wifi2 connected, signal strength -48
18:28:57 wireless,info 90:09:DF:**:**:E9@wifi2 disconnected, SA Query timeout, signal strength -48
18:29:12 wireless,info 90:09:DF:**:**:E9@wifi1 connected, signal strength -52
Current solution is rollback to 7.14.3I am facing Wi-Fi connection issues with my hAP ax3, causing the laptop to disconnect during Zoom meetings. As you can see in the logs the laptop gets disconnected from AP despite having the strong signal. This happened in the recent stable version (7.15.3) and still happens in the latest rc version.
The laptop has an Intel(R) Wi-Fi 6 AX203 adapter, and the version of the driver is 23.60.1.2. Anything I can do to fix it myself?
So far the only solution for me was temporarily replacing hAP ax3 with my old hAP ac2 running wifi-qcom-ac drivers, which works pretty well.
netsh wlan show drivers
Interface name: WiFi
Driver : Intel(R) Wi-Fi 6E AX210 160MHz
Vendor : Intel Corporation
Provider : Intel
Date : 24/07/2024
Version : 23.70.2.3
INF file : oem27.inf
Type : Native Wi-Fi Driver
+1Will there be a mDNS announce for the MT devices?
Right now to use a pretty name you need to set up a static record which can't be in .local AND you need to change its address manually (or write weird scripts to update address automatically).
Details11/Jul/24 11:09 AM
Description
Hello,
I have a feature request:
It would be nice to have a column in wifi/registration-table which shows the active security-type of the registered client. e.g. wpa2-psk,wpa3-psk,etc. Maybe even more “granular” as there are many subtypes for WPA and EAP as well. WPA2-PSK+FT, WPA3+FT to just name these two.
Such a column would really help people to troubleshoot “problematic” clients. As the used authentication types if quite often the source of issues at some clients.
Thank you!
Great!I already requested that feature via SUP-158802.
Details11/Jul/24 11:09 AM
Description
Hello,
I have a feature request:
It would be nice to have a column in wifi/registration-table which shows the active security-type of the registered client. e.g. wpa2-psk,wpa3-psk,etc. Maybe even more “granular” as there are many subtypes for WPA and EAP as well. WPA2-PSK+FT, WPA3+FT to just name these two.
Such a column would really help people to troubleshoot “problematic” clients. As the used authentication types if quite often the source of issues at some clients.
Thank you!
Yes, they thanked me for making suggestions.Great!I already requested that feature via SUP-158802.
Have they already answered to your request?
made me curious. stumbled upon a user called "pe1chl" over at un*f* community forum feature request. lol. nevertheless, having this feature would be a great thing for debugging and understanding "troublesome" wifi-clients better.Interestingly, the same request is pending at the competitor for at least 3 years...
Start wondering if it may be technically difficult or impossible to get that information on the AP side...
Did you open new topic, maybe someone can help you ?hope so sooner rather then later as im having a horrible time with these ax devices for months now cant get it working properly
Would not make sense to me. Winbox 4 is early beta and that should not be any priority for any stable ROS version right now.polish behavior with new Winbox 4, but - who knows...
no ive read all i can find here and everyone has same if not similiar problemsDid you open new topic, maybe someone can help you ?hope so sooner rather then later as im having a horrible time with these ax devices for months now cant get it working properly
/ip dns set vrf=mgmtvrf
This help page knows:The documentation (https://help.mikrotik.com/docs/display/ROS/DNS) does not even list the vrf parameter, so who knows!
+1 for PPPoE. My CCR2116 maxes out at around 900MBps on my pppoe uplink.Is there any chance of Multicore Processing of Following in ROS v7.x:
1. MPLS + VPLS
2. PPPOE
3. VXLAN
09:28:26 bridge,warning "bridge" peer disconnected
09:28:26 bridge,warning "bridge" peer link down
09:28:26 bridge,info "bridge" peer link up
09:28:26 bridge,info "bridge" peer connected
09:28:26 bridge,info "bridge" peer becomes secondary DC:2C:6E:D2:AF:4B
And/or support of the hardware-accellerated PPPoE that some chips support (even low-end ones)...+1 for PPPoE. My CCR2116 maxes out at around 900MBps on my pppoe uplink.
Yes This is because of Entire load on of PPPoE is getting Processed by Single Core. You can Observe one of the CPU Core will be Fully Choked while Other Cores will be Idle or Free.+1 for PPPoE. My CCR2116 maxes out at around 900MBps on my pppoe uplink.
Hi,I am wondering the same. This cycle does take really long... My guess is that they are doing some internal changes to polish behavior with new Winbox 4, but - who knows...
We are not in a hurry xD, we only saw that in its official documentation and we were surprised to see v7.17 with green markers.
issued SUP-160816 on 2024-07-31 with not a single reactionv7.16rc4 - DNS VRF does not work.
When setting:the system always sends DNS queries via the main vrf, regardless of this setting.Code: Select all/ip dns set vrf=mgmtvrf
+1 for VXLAN!Is there any chance of Multicore Processing of Following in ROS v7.x:
1. MPLS + VPLS
2. PPPOE
3. VXLAN
This parameter means that DNS listens for queries from the clients in a specified VRF. As far as I understand you have DNS servers also reachable from the VRF and you expect that router will send dns queries to those servers on that vrf? If that is the case then you misunderstood what the parameter does and that feature is not even implemented. This feature is in a todo list.issued SUP-160816 on 2024-07-31 with not a single reactionv7.16rc4 - DNS VRF does not work.
When setting:the system always sends DNS queries via the main vrf, regardless of this setting.Code: Select all/ip dns set vrf=mgmtvrf
had the same idea with a mgmt vrf where i needed DNS resolution ... went the "main VRF only it is then.." route
Mmulticore MPLS / MPLS HW OffloadIs there any chance of Multicore Processing of Following in ROS v7.x:
1. MPLS + VPLS
2. PPPOE
3. VXLAN
+1 for PPPOE multicore. We have thousands of customers using PPPOE and want to start rolling out speeds faster than 1Gb but can't do so with single core limitations.Is there any chance of Multicore Processing of Following in ROS v7.x:
1. MPLS + VPLS
2. PPPOE
3. VXLAN
OT: (sorry)we discovered that netflows are not generated when they are about inter vlan l3hw accelerated traffic.
observing this on CCR2216 where we had spike of traffic (l3hw) on a vlan on these was not reported on our netflow collector.
SUP-165456 generated
so the VRF setting for DNS does not mean, the DNS resolver works in that VRF but rather only listens in that VRF?This parameter means that DNS listens for queries from the clients in a specified VRF. As far as I understand you have DNS servers also reachable from the VRF and you expect that router will send dns queries to those servers on that vrf? If that is the case then you misunderstood what the parameter does and that feature is not even implemented. This feature is in a todo list.
issued SUP-160816 on 2024-07-31 with not a single reaction
had the same idea with a mgmt vrf where i needed DNS resolution ... went the "main VRF only it is then.." route
is there a possible solution to resolve to upstream dns from e.g. a management VRF?Yes, it is like in any other configuration with vrf parameter.
/ip dns set servers=1.1.1.2@management
In my point of view, this would be better solved using by the concept that uses Juniper.Is it also on the todo list to have the service (and other services, e.g. NTP) on more than one VRF?
In that case, if the services were in vrf=management, to other services reach that service would be obligatory some kind of route-leaking between those vrfs.(like /ip dns set vrf=vrf1,vrf2,vrf3)
When you talk about PPPoE in single thread, you are talking about what case specifically?
Yes This is because of Entire load on of PPPoE is getting Processed by Single Core.
kentikOT: (sorry)we discovered that netflows are not generated when they are about inter vlan l3hw accelerated traffic.
observing this on CCR2216 where we had spike of traffic (l3hw) on a vlan on these was not reported on our netflow collector.
SUP-165456 generated
@rpingar
what netflow solution do you use to collect and view the data?
Your request should change a bit!Is there any chance of Multicore Processing of Following in ROS v7.x:
1. MPLS + VPLS
2. PPPOE
3. VXLAN
This is a known and long standing issue with RouterOS on platforms that make use of L3HW.we discovered that netflows are not generated when they are about inter vlan l3hw accelerated traffic.
observing this on CCR2216 where we had spike of traffic (l3hw) on a vlan on these was not reported on our netflow collector.
SUP-165456 generated
Another thing that needs to be said is that VRF and Hardware Off-Load did not work well together on RouterOS.Your request should change a bit!
Instead of requesting Multithread, request hardware offload to then!
Just to confirm...we discovered that netflows are not generated when they are about inter vlan l3hw accelerated traffic.
observing this on CCR2216 where we had spike of traffic (l3hw) on a vlan on these was not reported on our netflow collector.
SUP-165456 generated
Agree, we are hoping years to get MT v7 stable , the hardware seem promising thoughWould not make sense to me. Winbox 4 is early beta and that should not be any priority for any stable ROS version right now.polish behavior with new Winbox 4, but - who knows...
I think it is appropriate to require multithreading for these protocols. Mikrotik currently sells a lot of multicore devices that does not have the ability to support L3 hardware offload.Your request should change a bit!
Instead of requesting Multithread, request hardware offload to then!
@bbs2web: MLAG peerlink
I have 2 x CRS317 with LACP 802.3ad peerlink, connected via 2 x 10g DAC cables. No issues with the peerlink flapping. Zero.
Since you have already tried replacing the DAC cables without success, maybe try moving the peerlink to two SFP+ (10G) ports with suitable DAC cables. Rule out an issue with the Q+ port(s). Alternatively since you said this occurs frequently, unplug one of the Q+ ports (I know it breaks redundancy, but for the testing period hopefully that is acceptable), and see if the flapping continues. Then plug that Q+ port back and unplug the other Q+ port... Trying to determine if this is the peerlink over Q+ failing, or a bad Q+ port, or a bad Q+ cable.
This...I think it is appropriate to require multithreading for these protocols. Mikrotik currently sells a lot of multicore devices that does not have the ability to support L3 hardware offload.Your request should change a bit!
Instead of requesting Multithread, request hardware offload to then!
I think we may be talking about different use cases.What is that even supposed to be, "software based offload"?
What we have now is a situation where PPPoE client is handled in software. The ethernet frames are received just like IP, and then the PPPoE header is stripped off in software.
It looks like this causes some load on the CPU limiting the datarate to below 1Gbps on many routers.
Other manufacturers use the same SoC in their routers and do not have that problem, so apparently there is capability in the SoC to offload part of this work to hardware. But MikroTik does not use it because they use standard Linux network handling instead of the dedicated SDK from each SoC manufacturer (understandable because of the many different SoC used in different MikroTik routers).
So now we have to hope that at some point, for the router models that support it and that are most affected by this, this offloading is supported by RouterOS.
But it would still be hardware offloading.
All those manufacturers that make the routers that ISPs supply with the 1Gbps+ internet connections they deliver today.I think we may be talking about different use cases.
Which other manufacturers and products are you comparing Mikrotik to ?
Multithreading is often not useful in these cases because at least a single connection will have to be processed in sequence.This "software based offload" looke like multithreading to me in this case...
in programming thats not the case. you can have multiple queues bound to a single io. that io can be its own thread, the other queues can be their own threads.Multithreading is often not useful in these cases because at least a single connection will have to be processed in sequence.This "software based offload" looke like multithreading to me in this case...
I.e. when you have a single PPPoE client connected to your ISP via a single network interface, there is not much that can be multithreaded.
Of course when you have two ISP each with a PPPoE client, they can be distributed over different cores, at least when the hardware allows it.
(unfortunately in todays routers there often is a switch chip connected to the CPU with only a single connection, and all incoming traffic is handled the same, no matter what interface it arrived on)