Page 1 of 1

hAP AC2 random boot loop

Posted: Sat Sep 30, 2023 3:45 pm
by harryng9108
Hi all,
I have about 20 hAP AC2 and so far 10 of them at least once failed to boot randomly and required a site visit to recover.

The affected routers seem failed to boot and restart itself after about a minute.
Before that the routers were either rebooted from Winbox or power adapter unplugged
RouterOS and Firmware: 7.9.2 and 7.10.2
All are less than 7 months old but the same issue occurs on a 3 weeks old router.
The only way to bring it back to life is using NetInstall which will wipe all the useful log and supout for troubleshooting.
I am thinking of getting a https://mikrotik.com/product/woobm and see if I can see anything while the device stucked in boot loop.
However, I am a bit skeptical if it even works on hAP ac2

I used to manage a network with thousands of Mikrotik mainly hAP AC lite, hAP AC3 but I havent seen them failing this way.
It made me thinking that it might be something wrong with how I use the device. I searched the forum and I found a few others having exact same issue as mine:

viewtopic.php?t=162083
viewtopic.php?t=197014
https://www.youtube.com/watch?v=ra1eXyypNMI&t=56s

It seems something to do with the flash memory as some suggested.
In my configurations, I have the main router os and zerotier package. About 30 entries in firewall address list, graphing and critical logging on disk turned on.
The screenshot in the attachment is from one which I netinstalled a week ago. I recalled it had about 2% free space and now it has gone down to 0%
I removed all heavy files under flash already...

Anyone else has similar issue who can shed some light here? I have other customers using the same configuration on hAP Ax Lite without issue so far.

Re: hAP AC2 random boot loop

Posted: Sat Sep 30, 2023 9:31 pm
by Moba
This is not going to be very helpful, but I have two of these and the only issues I have had (incl. a bricked 3rd unit years ago) have been with v7. They run hot, but they have been rock solid as APs for me. To be fair, I dot not need to push them, since I have another router doing all the NAT/VLAN heavy lifting.

The transition to a much newer kernel (v3.x-v5.x is hundreds of changes just for mitigation alone), with all the extra added features in ROS, and combined with the very low specs of this device at this point, is IMO a recipe for instability - I know that not everyone agrees with this, and we all have different requirements...still, your failure rate is horrendous.

There must be a reason why MT upgraded all new devices to have more memory/flash and more recent ARM 64 SoCs... Many other companies making low-cost embedded devices (and even very expensive ones, i.e. phones) usually back port security fixes onto the original kernel provided by the SoC manufacturer until the products are EOL (and not without issues either). MikroTik is very generous with its software/firmware policy...but there lies a big caveat as well. Since this is obviously an enterprise/commercial environment, maybe an upgrade is in order if you need v7. I understand yours are new, but I purchased my first unit back in 2019. Or maybe it's a bug and MT will patch it in a future release...so consider sending your supout files.

Re: hAP AC2 random boot loop

Posted: Sat Sep 30, 2023 11:01 pm
by bpwl
graphing and critical logging on disk turned on.
Could you move some of these to a USB memory stick? Idem for backup and other things we like to save in flash.

Re: hAP AC2 random boot loop

Posted: Sun Oct 01, 2023 2:12 pm
by harryng9108
This is not going to be very helpful, but I have two of these and the only issues I have had (incl. a bricked 3rd unit years ago) have been with v7. They run hot, but they have been rock solid as APs for me. To be fair, I dot not need to push them, since I have another router doing all the NAT/VLAN heavy lifting.

The transition to a much newer kernel (v3.x-v5.x is hundreds of changes just for mitigation alone), with all the extra added features in ROS, and combined with the very low specs of this device at this point, is IMO a recipe for instability - I know that not everyone agrees with this, and we all have different requirements...still, your failure rate is horrendous.

There must be a reason why MT upgraded all new devices to have more memory/flash and more recent ARM 64 SoCs... Many other companies making low-cost embedded devices (and even very expensive ones, i.e. phones) usually back port security fixes onto the original kernel provided by the SoC manufacturer until the products are EOL (and not without issues either). MikroTik is very generous with its software/firmware policy...but there lies a big caveat as well. Since this is obviously an enterprise/commercial environment, maybe an upgrade is in order if you need v7. I understand yours are new, but I purchased my first unit back in 2019. Or maybe it's a bug and MT will patch it in a future release...so consider sending your supout files.
Thank you for your inputs. I have had good experience with Mtik for many years. The only reason why I chose v7 is to have zerotier package. The benefit of having L2/L3 access to any router from any providers are just too good to ignore :D
However, only these hAP AC2 made me nervous every time we have to reboot to perform any updates... There must be something simple that I dont know about

Re: hAP AC2 random boot loop

Posted: Sun Oct 01, 2023 2:18 pm
by harryng9108
graphing and critical logging on disk turned on.
Could you move some of these to a USB memory stick? Idem for backup and other things we like to save in flash.
Agreed, it is best not to have these data locally on the device itself. I am considering to use other snmp and syslog software to pull the information out of the device periodically.
However, I have one question here. How much would graphing would use within 3 weeks to fill up the flash memory.
Also, how can we remove that data because it seems hidden somewhere unlike disk logging.
I tried turn off graphing but no no change on the router memory itself.

Also, the biggest question is would 100% flash memory prevent the Tik from booting up?

Re: hAP AC2 random boot loop

Posted: Tue Oct 03, 2023 3:53 am
by harryng9108
Anyone has an answer?

Re: hAP AC2 random boot loop

Posted: Tue Oct 03, 2023 4:22 am
by Moba
Logically, any data that needs to be saved during normal operation will have its process fail if no memory is available, which can cause major issues. These routers are nice, but they weren't designed for v7 and that branch has numerous issues yet unresolved. I understand your need for ZeroTier, but it doesn't change the possible limitations of this device using essentially beta firmware. Many users have updated these devices without issues, but everyone has a different use case. I'm not sure what kind of answer you're hoping for...

bpwl is one of the most knowledgeable users here, btw.

Re: hAP AC2 random boot loop

Posted: Tue Oct 31, 2023 10:05 pm
by kristapsesterlins
Hi,

Today I created a Mikrotik support request with the same issue as OP has described.

At the moment I have two separate routers which had the same issue (internal storage filled up, 0 Kb Free space).

The current router - Chateau LTE6, ROS 7.11.2 had Graphing enabled with the option to save data to internal flash every 24h.
AFAIK there is no way to change the setting on where to store the graphing data even if you have a microSD/USB drive attached.

I also had DoH with Certificates from https://curl.se/docs/caextract.html imported in the router.

I suspect that the prime cause is the Graphing tool. There is some crucial information missing from the current documentation (such as options for storing data to RAM) - https://help.mikrotik.com/docs/display/ROS/Graphing comparing to to the old - https://wiki.mikrotik.com/wiki/Manual:Tools/Graphing

When the internal flash is full, a reboot of router will cause it to lock and thus the only way is to either reset the configuration via the physical button or completely wipe out the router via netinstall.

My biggest complaint in this situation is that there is no way to access the internals of the built-in storage or to see what is causing the fill-up. I know that Chateau has only 16Mb of internal storage and after a clean install it is about ~3Mb, but if so, there should be a way to see which section (certificates, graphs, firewall list etc.) is consuming space.

Re: hAP AC2 random boot loop

Posted: Wed Nov 01, 2023 12:14 am
by mkx
Firewall address lists can consume quite a bit of permanent storage ... my solution on a hAP ac2 since upgraded to v7 is to not use firewall address lists, at least nothing with more than a few tens of members (I've learned my lesson the hard way - I had to netinstall device to get out of death spiral ... twice).

Re: hAP AC2 random boot loop

Posted: Thu Nov 23, 2023 10:35 pm
by sajgan
I have this problem too. I have about 500KB of free disk - all writing is disabled. When I turn on DOH and fill the cache a bit (I'm talking about 500 entries) and reboot, the router does not turn on - bootloop.

I reset the router using the button - I restore from a backup, I turn off DOH - I fill the DNS cache - reboot, reboot, brutal reboot... everything works.
I've been using doh for a long time, probably since version 6.4x (but I could be wrong, it was a long time ago) - I always upgraded from the script, fully automatic - he only sent me emails saying he did it. From 7.11.2 onwards such problems.

Re: hAP AC2 random boot loop

Posted: Tue Jul 02, 2024 3:30 pm
by Wyz4k
Looks like this is a common problem. I have a hAP AC2 doing the same thing.

After power on the power LED and user LED turn on, then all of the ethernet LEDs, then the ethernet LEDs turn off and only connected ethernet LEDs start flashing. They flash for another 20-30 seconds, before turning off. At this stage only the power LED is still on.

During initial boot all the ethernet lights work when I move an ethernet device between them, but at the end no ethernet lights turn on when I move cables between them.