Page 1 of 1

RB951G-2HND Reboot issues and system corruption

Posted: Thu Oct 26, 2017 11:37 pm
by BRMateus2
This might be an duplicate, because my first topic doesn't show...

Hello, so I bought the RB951G-2HND in september and it was my first one, before I had an TP-Link running DD-WRT which freezes after some days and the default firmware was even worse.
Everything perfect, f* fast router and features that I loved, acting as an advanced home router, until I rebooted it (6.40.4) by the Winbox (3.11) menu after an upgrade from (6.35~), and it simply boot-froze, so I did an Netinstall as of below method (for any newcomer) and reconfigured without restoring backup. Restarted again and everything ok, fast, rules and QoS seems optimized.

After that, the energy went down one time and it booted perfectly.
Other day (1 week later), the energy went crazy (light pole fuse broke) and got from 127 volts to 80 volts for 1 hour 2a.m. (I measured it before, high probability), it maintained working without reboots.
Rebooted it the next day, as I tought the eletricity went out other than being at lower voltage. Nothing in the logs that warns. Rebooted perfectly.
So after 7 days without any problems, I decided to reboot to refresh all the data and memory and caches, and them it did again, boot-froze and had to do another netinstall.

Boot-froze: it beeps one time, (doesn't twice-beep an second time, meaning it didn't boot properly).

So I did the netinstall again and now I restored my last backup, because I would hate to config. everything again (DDoS deny rules, tarpit, etc), and it restored with issues: it rebooted with some slowdowns in the Winbox, I selected supout tool and started it and it freeze and logout me, so I decided to reboot again after the supout bug and restore-and-automatic-reboot, and it stopped accepting Winbox login, so I had to turn off by removing eletricity and putting again, it rebooted and worked, logins ok, everything from the backup ok and restored, supout file 300kB. So them I decided to make another supout as the first try could bugged with the force reboot, started, got 100%, file size (504kB). From before, I could not get ANY data other than the backup, not even any supout.

Follows the supouts, to bugfix, if possible:
https://drive.google.com/open?id=0BxFQ1 ... 3RZZ3hhaXM


**********Netinstall
For fresh format, using netinstall:
Server IP 192.168.88.3 and MAC 255.255.255.0 and Gateway 192.168.88.1
Netinstall Boot Server Client to 192.168.88.1
Press RESET while booting the MikroTik, until beeps or shows in netinstall or light change state.
That's all.

Re: RB951G-2HND Reboot issues and system corruption  [SOLVED]

Posted: Fri Oct 27, 2017 11:05 am
by pukkita
Looks like your NAND is gone, you'd better write support, short of netinstalling it again, resetting it to no defaults and reconfiguring it looking if it holds fine this time.

Better than using a .backup, you could make an export, so that you can just copy & paste the config on this, or any router (CLI commands). This allows you to copy & paste sections individually for better troubleshooting in the event of the setup failing.

Re: RB951G-2HND Reboot issues and system corruption

Posted: Fri Oct 27, 2017 4:48 pm
by BRMateus2
Many thanks for your answer! I sent an email right now to support@mikrotik.com entitled "[Bug Ticket request] RB951G-2HND Reboot issues and system corruption".

I love that router, would like to fix if its possible. Looked right now at bad blocks count, its 0.3% after the netinstall (before it was 0.0%)..... pretty strange as its new; manufacture error?
Sector writes since reboot (12hs uptime): 7 577
Total sector writes: 157 583
Lets hope for the best.

Re: RB951G-2HND Reboot issues and system corruption

Posted: Fri Oct 27, 2017 7:02 pm
by pukkita
Lots of power off/power on and bad electricity supply can corrupt the NAND format or damage it, specially if you're writing constantly to it (do you have graphs active?).

If you're experiencing such electricity supply unstability, you'd better either get an UPS at least for the router... this router can be powered both via its DC jack, and PoE In, there are lots of small and affordable micro UPS.

As the router supports being powered from 9-30V, even a 9V cell could power it for some time so that you can shut it down, or at least "absorb" the micro blackouts.

For a hassle free, and possibly a wise investment in your situation, I'd get a mUPS.

Re: RB951G-2HND Reboot issues and system corruption

Posted: Fri Oct 27, 2017 7:45 pm
by BRMateus2
Thank you for the suggestion! Yes I did use Graphing, today I disabled the disk-write and I'm graphing only to RAM;
It happens to bug electricity every one to three months, its not that common; as its summer here and raining period, MikroTik suffered only one electricity cut and one low-voltage electricity.

Is there a way to use an pendrive as system partition? The RB has USB port which might even work with external hard disk more reliable than its own NAND filtering circuitry.

Re: RB951G-2HND Reboot issues and system corruption

Posted: Sat Oct 28, 2017 12:50 pm
by pukkita
No, USB external disk cannot be used as system partition, and I'm afraid it cannot be used for graphs storage either.

External storage can be used for web proxy cache, samba sharing, etc.

Re: RB951G-2HND Reboot issues and system corruption

Posted: Sat Oct 28, 2017 3:35 pm
by BRMateus2
So I disabled all writes and even the 24hs DHCP leases; I might had bad luck and System was stored in bad blocks, because today I tested twice, reboot and it booted ok, shutdown and it booted too.

Lessons learned: never write data in an interval less than 24hs to "any" NAND, or never write at all. It's an luck-game which you lose with bad NAND.

DHCP leases set off (it was 24hs)
Graphing set from 5 minutes (I thought that was the capture interval too) to 24hs and I disabled all graphing writes to disk; I hope the setbox works, because there is no "never" option in the write interval like DHCP;

I'm graphing to RAM I hope.

Nothing else was writing to disk. Did an test, sent an bogus file to NAND which filled it and got no increase in bad blocks (stayed 0.3%) and then I deleted that file.

If nothing from support conclusion helps this solid thread, It's solved then.

Edit---(11:50/30/10/2017 or 201710301150 ISO Time GMT -3) 201710301350 UTC
I am editing this, because adding a new post is not worthy (why bump this topic again?)
Support answered there were no logged crashes at supout.rif

So, as after disabling all possible writes to disk and after some reboots testing in random intervals,
Uptime 10:00hs
Sector Writes since Reboot: 277
Total Sector Writes: 223050 (remember that I sent an 110MB file just for testing, it got like 100000 writes, with no change in bad block count)

I did an reboot right now, as to test if it crashes, and it booted perfectly.
The reason of this topic and concerns where:
*Two boot-lockups happening for no apparent reason, both after reboots with nothing in logs warning anything and 0.0% bad blocks.
*Did not know what was causing that, could be anything as I was not experienced with such symptoms before and neither had powerful devices with flash inside, only Hard Disks.

Possible solution, I have tested many times after the original post:
*Disabled all writes to disk, even DHCP leases, graph to RAM or never graph.

Possible cause after answers:
*Bad NAND from factory OR bad luck with system writes just into undetected bad blocks, as after netinstall, bad blocks finally arised to 0.3% which is inside safe interval, but who knows the rest?