I just sent the following in an e-mail to MikroTik Support (ticket #2014112966000099):
I have stumbled upon a really simple way to very reliably reproduce a crash and reboot scenario on RB1000/1100/1100AH products when using MetaROUTER. This particular scenario only appears to occur on RouterOS 6 (I tested as far back as 6.7); if I try the same thing on 5.26, it doesn't reboot. I know that there were random occurrences of crashes and reboots with MetaROUTER on PowerPC boards in RouterOS 5 that were seemingly never resolved, and it is possible that this particular crash is unrelated and that the underlying reason for this crash is not the only cause of MetaROUTER-related crashes on these boards. Still, I am hopeful that most or all of the reasons for crashes are related somehow, and that by investigating this particular crash and finding a fix for it, you might end up actually fixing the majority of other MetaROUTER-related crashes.
It turns out that to trigger a crash on RB1000/1100/1100AH with MetaROUTER on RouterOS 6 is so simple, I'm surprised I haven't run across it earlier. It's also possible that you are already aware of this method. Basically, you just have to be running a MetaROUTER within the first 5 minutes of uptime after booting RouterOS. That's it. As long as you start a MetaROUTER sometime between 0 and 5 minutes after bootup, the router will crash at almost exactly the 5 minute mark, regardless of how long the MetaROUTER has been running: it will crash at around 5m of uptime if you start the MetaROUTER at boot, or if you start the MetaROUTER at 4m45s of uptime. The CPU doesn't even have to be busy, and the MetaROUTER doesn't even need to have any interfaces added to it. I have tried this on an RB1000, an RB1100, and 2 RB1100AH boards. They all behave *exactly* the same way.
It appears that something is happening just before the 5 minute uptime threshold is crossed, even when a MetaROUTER is not running. If you boot up an RB1xxx, and then connect to it and start running "/system resource print interval=1" on the console or a Winbox terminal, you will see that at "4m55s" of uptime, the value of "Uptime" suddenly skips ahead by 10 seconds to "5m5s", and then it will stay at "5m5s" for 10 seconds, as if it is waiting for the real, internal uptime clock to catch up. (Winbox gets confused by this, and it starts counting down *backwards* to 5m0s, and then back up again.) At "5m6s" of uptime, after the "Uptime" clock matches reality again, if you are running a MetaROUTER, the RB1xxx will crash and reboot itself. If you are not running a MetaROUTER, it will not crash and reboot.
This strange "uptime skips ahead 10 seconds when it reaches 4m55s" bug does not occur in RouterOS 5 for PowerPC. It *does* occur on virtually every version of PPC RouterOS 6 (including 6.23rcX), and it even happens on other Freescale-based RouterBoards that use the MPC85xx kernel, like the multicore RB1100AHx2 or the RB850Gx2. Of course, you can't run MetaROUTER on these, so they don't reboot themselves, but the uptime clock does the same bizarre skip-10-seconds trick in RouterOS 6 on all of those routers. (Another interesting fact: if I "hack" an RB850Gx2 to run the uniprocessor kernel, and boot a MetaROUTER up on it, even though the uptime clock does the funny skip-ahead-10-seconds thing, the router does not crash and reboot after the uptime clock catches back up to reality! Only RB1xxx boards running MetaROUTERs crash at that point!)
Another interesting thing that may or may not be related: if I run a MetaROUTER on an PPC board, even ones that don't crash and reboot, I see the following show up on the kernel ring buffer every few minutes:
INFO: task fs-server-1:361 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fs-server-1 D 00000000 0 361 2 0x00000000
Call Trace:
[dd22ddb0] [80164714] blk_finish_plug+0x1c/0x58 (unreliable)
[dd22de70] [80006350] __switch_to+0x7c/0x94
[dd22de80] [8028ce64] __schedule+0x18c/0x374
[dd22dec0] [e1863960] wait_for_data+0x84/0x12b8 [fs-back@0xe1863000]
[dd22df00] [e1863b54] wait_for_data+0x278/0x12b8 [fs-back@0xe1863000]
[dd22dfb0] [80047650] kthread+0x84/0x88
[dd22dff0] [8000b9cc] kernel_thread+0x4c/0x68
The crashes do not coincide with these messages, so they may be completely innocent and normal, but because it includes a stack trace, I thought it would be worth mentioning.
So, to sum up, here is how you reproduce a MetaROUTER crash on PowerPC RB1xxx boards:
- Netinstall 6.23rc7 onto an RB1000, RB1100, or RB1100AH, and boot it up.
- Connect with Winbox, bring up System -> Resources, and bring up a Terminal running "/system resource print interval=1"
- Watch the Uptime counter both in the Terminal and on the Resources window. When it gets to 4m54s, it will skip to 5m5s. The Terminal Uptime number will stay frozen for 10 seconds, while the Winbox/Resources window uptime will count backwards to 5m0s, and then back forwards.
- Create a single MetaROUTER with "/metarouter add"
- Reboot the RB1xxx to reset the system uptime back to 0.
- Connect again with Winbox and bring up System -> Resources.
- Wait for 4m54s. At 4m55s, just like before, the uptime will skip ahead 10 seconds to 5m5s, count backwards to 5m0s, and then count back forwards.
- Because a MetaROUTER is now running, at 5m6s, the router will reboot itself.
Thanks for looking into this.
-- Nathan