Community discussions

MikroTik App
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

hardware issues with rb4xx

Sun Apr 22, 2012 5:26 am

We are having many failures of rb433 and rb411. The boards all run 5.14 with updated bootlader. Boards boot fine for a couple of times, then stop working. Boot loader goes but nand is corrupt. Have to reflash with net install. Happens with a variety of radios (rb52hn, xr5, rb52h) does not matter which. We now have about 10 failures in the last month. Big issue. Power is ubiquiti 24v Poe.

Anyone have ideas? Customers are pissed. We have tried everything.
 
0ldman
Forum Guru
Forum Guru
Posts: 1465
Joined: Thu Jul 27, 2006 5:01 am

Re: hardware issues with rb4xx

Sun Apr 22, 2012 7:58 pm

I had one upgrade to 5.12 with odd problems, downgraded to 5.11 and kernel locked. Netinstalled 4.17, updated firmware, installed 5.11 and it works fine now.

I don't know if these are related or not, but I'm curious.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Mon Apr 23, 2012 11:46 pm

This is what we see from the boot loader after the board fails. Somehow nand is corrupted. These are brand new boards from factory. All we do is install 5.14 if it is not already on and update boot loader to 2.39. If you pull power a few times, the boards will eventually crash.

Boards are from different build lots. The only thing common is using 5.14. We do not have this issue with other versions of ROS. We have had 10 board failures this month.

Not related to power supply, as we have tried several with the same results. It might be related to higher power miniPCI cards. We see this with XR5, XR3, and RB52Hn, but not with RB52n. At least we have not replicated with lower power board. Might just be lucky.

RouterBOOT booter 2.39

RouterBoard 433AH

CPU frequency: 680 MHz
Memory size: 128 MB

Press any key within 2 seconds to enter setup..
loading kernel from nand... OK
setting up elf image... OK
jumping to kernel code
Starting...
/etc/rc.d/rc.sysinit: 32: cannot create /var/run/utmp: Directory nonexistent
System halted.
 
0ldman
Forum Guru
Forum Guru
Posts: 1465
Joined: Thu Jul 27, 2006 5:01 am

Re: hardware issues with rb4xx

Tue Apr 24, 2012 3:11 am

Definitely something wrong there.

Have you contacted support? What happens after a netinstall and several reboots?
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Wed Apr 25, 2012 12:14 am

What we are seeing is increasing bad blocks reported under system resources. It appears that with 5.14 and 5.15 that every power off reboot (pulling power) causes incremental nand corruption. This seems to continue until it reaches critical mass of 3 or 4% and then the system loses some critical files and has to be reimaged using netinstall.

We thought that maybe it was related to SMB, so we disabled all shares, users, and made sure SMB was off, which it already was. This made no difference.

We thought maybe it was the web proxy store, which is created by default, so we deleted it. This made no difference.

We are not writing any logs to disk.

We have no idea what is causing this, but it is definitely happening. Anyone else watching their bad blocks and seeing them increase under 5.14?
 
Mrhot2000
newbie
Posts: 25
Joined: Mon Oct 17, 2011 6:36 am

Re: hardware issues with rb4xx

Wed Apr 25, 2012 8:46 am

I had similar issues with 3 boards. I net installed them few times untill i initially got 0 bad blocks and now they are woring fine. These boards showed 3 percent bad blocks the day i got them brand new. they crashed and then i net installed and everytime i net installed the bad blocks increaed and event\ually went upto 60 percent but then all of a sudden net installed again and the bad blocks disappeared and the boards are running fine since
 
0ldman
Forum Guru
Forum Guru
Posts: 1465
Joined: Thu Jul 27, 2006 5:01 am

Re: hardware issues with rb4xx

Wed Apr 25, 2012 5:17 pm

You do know about the "system check-disk" command?

I've had a few boards have bad blocks, this corrected them.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Wed Apr 25, 2012 7:18 pm

We know that we can "fix" the problem by reformatting flash or using check-disk, but this is not a solution. We sell a lot of units to end users and we are having failures in the field. We need to root cause this so that we can eliminate the issue, not put a band aid on it.

The boards that are failing all are built on the 94V-0 1111 PCB and use the Samsung K9F1208U0C Flash. We have seen no issues with boards built using the Hynix chips.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Wed Apr 25, 2012 7:53 pm

We can confirm, 100% of the time, for boards built with the Samsung flash, that power off restart of the board causes incremental flash corruption. .1% per reset for the first 10 or so resets, then it starts escalating.

We are going to try downgrading to 4.17 and check downgrading bootloader.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Wed Apr 25, 2012 8:34 pm

Tried downgrading to 4.17 and the number of bad blocks held at 11.

One thing I noticed was that the system was continuing to has a DSA SSH key at every reboot. I don't recall this being the case and that the SSH key should only be hashed when the file system is wiped or the system reinstalled.

I upgraded back to 5.15. Now the number of bad blocks is 0. And the DSA SSH key is no longer being hashed. I think that the problem might be related to hashing the SSH key, or perhaps when the drive keeps getting corrupted it is happening where the key is stored. In either case, downgrading then upgrading seemed to fix the problem.

Does this mean that we have to do this for every system we receive in in order to assure that the file system is not going to corrupt? Major pain if that is what we have to do.

MT support has not responded to my trouble ticket. Any ideas.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Wed Apr 25, 2012 9:35 pm

There may be a correlation between having Winbox open at the time of power off reboot and corruption. When system is only attached via serial and console is open, there appears to be no corruption on reset. However if Winbox session is open at time of reset, we see .1% increase in bad blocks on each reset.

This only happens with 5.14 and 5.15, not with 4.17.

Seems like something is writing to file system even if you are not saving anything. We have exhausted our ability to diagnose any further. This happens with all default values
 
0ldman
Forum Guru
Forum Guru
Posts: 1465
Joined: Thu Jul 27, 2006 5:01 am

Re: hardware issues with rb4xx

Wed Apr 25, 2012 9:50 pm

I don't know about the one I had corrupt, but all that I have in my office are Hynix chips.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Wed Apr 25, 2012 9:58 pm

Here is a repetitive set of reboots. Note that each reboot increments bad blocks by .1%. Seems that warm reboot is less likely to corrupt flash than hard power off.

[admin@MikroTik] > system resource pr
uptime: 1m21s
version: 5.15
free-memory: 113960KiB
total-memory: 127168KiB
cpu: MIPS 24Kc V7.4
cpu-count: 1
cpu-frequency: 680MHz
cpu-load: 0%
free-hdd-space: 23604KiB
total-hdd-space: 61440KiB
write-sect-since-reboot: 192
write-sect-total: 218907
bad-blocks: 0.5%
architecture-name: mipsbe
board-name: RB433AH
platform: MikroTik
[admin@MikroTik] >

REBOOT #1 - Soft reboot

[admin@MikroTik] > system resource pr
uptime: 36s
version: 5.15
free-memory: 113584KiB
total-memory: 127168KiB
cpu: MIPS 24Kc V7.4
cpu-count: 1
cpu-frequency: 680MHz
cpu-load: 0%
free-hdd-space: 23588KiB
total-hdd-space: 61440KiB
write-sect-since-reboot: 153
write-sect-total: 219280
bad-blocks: 0.5%
architecture-name: mipsbe
board-name: RB433AH
platform: MikroTik
[admin@MikroTik] >

REBOOT #2 - Hard reboot - Power Off

[admin@MikroTik] > system resource pr
uptime: 26s
version: 5.15
free-memory: 113464KiB
total-memory: 127168KiB
cpu: MIPS 24Kc V7.4
cpu-count: 1
cpu-frequency: 680MHz
cpu-load: 5%
free-hdd-space: 23552KiB
total-hdd-space: 61440KiB
write-sect-since-reboot: 177
write-sect-total: 219304
bad-blocks: 0.6%
architecture-name: mipsbe
board-name: RB433AH
platform: MikroTik
[admin@MikroTik] >

REBOOT #3 - Hard Reboot - Power Off

[admin@MikroTik] > system resource pr
uptime: 28s
version: 5.15
free-memory: 113456KiB
total-memory: 127168KiB
cpu: MIPS 24Kc V7.4
cpu-count: 1
cpu-frequency: 680MHz
cpu-load: 0%
free-hdd-space: 23456KiB
total-hdd-space: 61440KiB
write-sect-since-reboot: 185
write-sect-total: 219312
bad-blocks: 0.7%
architecture-name: mipsbe
board-name: RB433AH
platform: MikroTik
[admin@MikroTik] >

REBOOT #4 - Hard Reboot - Power Off

[admin@MikroTik] > system resource pr
uptime: 26s
version: 5.15
free-memory: 113444KiB
total-memory: 127168KiB
cpu: MIPS 24Kc V7.4
cpu-count: 1
cpu-frequency: 680MHz
cpu-load: 1%
free-hdd-space: 23392KiB
total-hdd-space: 61440KiB
write-sect-since-reboot: 160
write-sect-total: 219287
bad-blocks: 0.8%
architecture-name: mipsbe
board-name: RB433AH
platform: MikroTik
[admin@MikroTik] >
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Fri Apr 27, 2012 1:03 am

We have now confirmed that RB433 built with Samsung NAND and DTC RAM combination are bad. In every single case, regardless of OS (4.17, 5.14 or 5.15), every power reset causes incremental bad blocks. We have replicated this problem on 15 different boards from inventory.

This does not happen with Samsung and Hynix or other memory. Memory tests run from bootloader report no error.

Once NAND corrupts to critical mass, the board ceases to boot.
 
Lakis
Forum Veteran
Forum Veteran
Posts: 703
Joined: Wed Sep 23, 2009 7:52 pm

Re: hardware issues with rb4xx

Sat Apr 28, 2012 1:42 am

Yes I also can confirm this on 433AH Bord have more than 20 failures after updating 5.9
no meter I downgrade failures continue
tomorrow I have to change 2 boards every 6-8 hours they stop responding and this start to happen after updating from 5.9
 
Devil
Member Candidate
Member Candidate
Posts: 170
Joined: Thu Jul 21, 2011 9:13 am

Re: hardware issues with rb4xx

Mon Apr 30, 2012 1:30 pm

This DOES seem like a serious issue. and you've spent quite some time to track it down. please contact the support team with those information. this has to be fixed.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Mon Apr 30, 2012 10:04 pm

We have sent support full documentation of the failures and all details. They say they want more information. We suggested that they sample RB433AH built with DTC memory and Samsung NAND to verify our test results and then, if appropriate, issue a recall on the boards.

This is a very serious manufacturing failure. It is not easy to detect or diagnose. It was only through weeks of trial and error testing that we were able to nail this down to a root cause. Should not be the customers responsibility to trouble shoot MT boards for them.

While RAM and NAND are commodity items, what this proves is that not all suppliers of commodities are equal. Since I could not find any details on who DTC is, my suspicion is that they are a clone factory in China with poor quality control. My suggestion is to go back to Hynix or other brands which have proven to be reliable.

This is a customer relations disaster for us and we are trying our best to fix the damage done. Not easy. What we need now is for MT to acknowledge the problem and fix it.
 
nicopretorius
Frequent Visitor
Frequent Visitor
Posts: 77
Joined: Mon Nov 15, 2004 9:49 am

Re: hardware issues with rb4xx

Wed May 02, 2012 11:31 am

I suggest you also check the capacitors on these boards for the problem we are experiening as per the follwing post:

http://forum.mikrotik.com/viewtopic.php ... 73#p314973
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Thu May 03, 2012 9:20 pm

This problem is not related to the capacitor problem. Boards that failed had the new 680uF caps. It is the combination of RAM and NAND.

BTW, testing RAM on the failed boards shows no problems.
 
tnakir
Frequent Visitor
Frequent Visitor
Posts: 56
Joined: Thu Aug 19, 2010 4:29 pm

Re: hardware issues with rb4xx

Fri May 25, 2012 12:46 pm

We've had the same problems with 433GL boards.
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 26912
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia
Contact:

Re: hardware issues with rb4xx

Fri May 25, 2012 12:55 pm

We've had the same problems with 433GL boards.
what do you mean by "same problem"? please email support with more details, we could use some supout.rif files. make sure you run RouterOS v5.16 and latest RouterBOOT firmware.
 
tnakir
Frequent Visitor
Frequent Visitor
Posts: 56
Joined: Thu Aug 19, 2010 4:29 pm

Re: hardware issues with rb4xx

Mon May 28, 2012 11:38 am

We have a box of 433GL that are unstable: rebooting, randomly unable to connect to them via Winbox (MAC/IP). We tried netinstalling to 5.16, upgraded fw to 2.39. Nothing helped.
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 26912
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia
Contact:

Re: hardware issues with rb4xx

Mon May 28, 2012 11:40 am

We have a box of 433GL that are unstable: rebooting, randomly unable to connect to them via Winbox (MAC/IP). We tried netinstalling to 5.16, upgraded fw to 2.39. Nothing helped.
please send supout.rif files to support, we might be able to see what happened there
 
tnakir
Frequent Visitor
Frequent Visitor
Posts: 56
Joined: Thu Aug 19, 2010 4:29 pm

Re: hardware issues with rb4xx

Tue May 29, 2012 10:52 am

We already did that. They told us to return to RMA.
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 26912
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia
Contact:

Re: hardware issues with rb4xx

Tue May 29, 2012 1:21 pm

We have a box of 433GL that are unstable: rebooting, randomly unable to connect to them via Winbox (MAC/IP). We tried netinstalling to 5.16, upgraded fw to 2.39. Nothing helped.
Regarding specifically the RB433GL, could you please check what CPU frequency they are running at? If at 800MHz, could you try to lower to standard 680MHz?
 
tnakir
Frequent Visitor
Frequent Visitor
Posts: 56
Joined: Thu Aug 19, 2010 4:29 pm

Re: hardware issues with rb4xx

Tue May 29, 2012 3:20 pm

Nope, they all came with stock frequency of 680 Mhz.
We also started to receive some 711-2HnD not being able to boot, they also came from the same box.
I just hope the problem exists only with this shippment :(
 
Montana
Member Candidate
Member Candidate
Posts: 196
Joined: Tue Jun 29, 2004 6:24 am
Location: Moscow Idaho

Re: hardware issues with rb4xx

Sat Jun 09, 2012 8:34 am

So what was the problem final determination with the 433AH boards? Was it hardware or software?
I have checked most of our AH boards and find some bad blocks 0.2 to 0.4. One board 5.14 was showing 13 bad blocks so I upgraded it to 5.17 and now it shows 0.2. That board started to randomly reboot so that is when I did the upgrade to it. The others hadnt started rebooting but the bad blocks were not that high only 0.2 to 0.4 Did the software fix the problem or just cleaned the disk up just to have it start all over again?
 
Dobby
Member
Member
Posts: 399
Joined: Wed Jan 11, 2012 12:07 am
Location: Hogwarts

Re: hardware issues with rb4xx

Sat Jun 09, 2012 12:04 pm

Deleted because not related.
Last edited by Dobby on Mon Mar 11, 2013 3:24 am, edited 1 time in total.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Sun Jun 10, 2012 4:19 am

Here is the answer from tech support. Clearly there was something wrong when writing to Samsung NAND.

Hello,

Thank you very much for the attached files and your spent time to help us.

Next RouterOS version will include fix for your problem,
What's new in 5.18 (2012-Jun-05 10:50):
*) fix bad block count not to increase on Samsung K9F1208U0C nand;

Regards,
Sergejs
 
frankie
Member Candidate
Member Candidate
Posts: 116
Joined: Thu May 08, 2008 9:45 pm

Re: hardware issues with rb4xx

Sun Jun 10, 2012 9:14 pm

I should mention that not only Rb4xx affected. 2 weeks ago I got RB750G with interesting symptom. Impossibility to reset configuration and copying new ROS to NAND-flash. Netinstalling 5.16 went succesfully, but reset configuration never load default configuration script. After opening case, NAND-Flash: K9F1208UOC. RB750G purchase date: 2010.10.28. Some of RB750 and RB750GL also has SAMSUNG NAND.
 
Montana
Member Candidate
Member Candidate
Posts: 196
Joined: Tue Jun 29, 2004 6:24 am
Location: Moscow Idaho

Re: hardware issues with rb4xx

Mon Jun 11, 2012 3:53 pm

Next RouterOS version will include fix for your problem,
What's new in 5.18 (2012-Jun-05 10:50):
*) fix bad block count not to increase on Samsung K9F1208U0C nand;
So does that mean they just fixed the counter to stop counting or the problem that causes the counting and reboots?

Thanks doghead for the effort your post help ident our problems onthe new back haul.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Thu Jun 14, 2012 8:06 pm

We just received a shipment of 40 RB411GL boards. Out of these we had 3 boards DOA with flaky firmware/reboots/ethernet connect issues.

Not sure if it is related, but all boards used the Samsung NAND and DTC RAM.

All 40 of the boards arrived with 5.11 on them and most would not boot or were unstable until we upgraded to 5.16. Winbox upgrade sometimes worked, but often we had to use Netinstall two or three times to get the update to take. Once a board was upgraded it seemed to be ok with no reported bad blocks.

Editorial: We have been shipping hundreds of RB433AH and RB411U boards over the last two years with great reliability up until the last few months. Of late we have seen an increasing number of DOA boards. Mikrotik's RMA policy for DOA is crazy. We have to get an RMA from MT, then and RMA from the Distributor, then send the boards to the distributor who in turn sends back to MT. This can take months to get back. In the mean time we have to buy new replacement boards. I would suggest that MT adopt a separate police for DOA (boards returned within 1 week of shipment with repair issues). Full equal exchange in this case. If the board tests good, then make the customer buy the board.

We now have hundreds (approaching thousands) of dollars of RMA boards that were DOA. We should not have to wait for repair of something that was defective upon receipt.

Clearly there is something defective with the Samsung NAND that will be fixed (we hope) in 5.18 (ETA???). It took us months to diagnose this problem for MT and for them to fix the problem (yet to even admit there was a problem). We have wasted hundreds of manhours on fixing a problem for the manufacturer. The least they could do is step up to the responsibility of fixing DOA boards at their cost, not ours.

This RMA policy really needs to change.
 
tnakir
Frequent Visitor
Frequent Visitor
Posts: 56
Joined: Thu Aug 19, 2010 4:29 pm

Re: hardware issues with rb4xx

Tue Jun 26, 2012 1:52 pm

ROS 5.18 did not help.

Some RB433GL (new from the box) still have issues with not booting, unable to perform netinstall etc etc...
 
tnakir
Frequent Visitor
Frequent Visitor
Posts: 56
Joined: Thu Aug 19, 2010 4:29 pm

Re: hardware issues with rb4xx

Thu Jul 26, 2012 4:28 pm

We received new shipment from Mikrotik.
Some of the boards expiriencing the same problems.
Unstable, rebooting, unable to access.
We tried installing ROS 5.19 and FW 2.41, and it didn't help.
 
User avatar
sergejs
MikroTik Support
MikroTik Support
Posts: 6697
Joined: Thu Mar 31, 2005 3:33 pm
Location: Riga, Latvia
Contact:

Re: hardware issues with rb4xx

Fri Jul 27, 2012 1:26 pm

tnakir, I'm sorry for any inconvenience caused by RB433GL. Please contact MikroTik support with serial numbers of RB433GL, we will try to help you.
 
tnakir
Frequent Visitor
Frequent Visitor
Posts: 56
Joined: Thu Aug 19, 2010 4:29 pm

Re: hardware issues with rb4xx

Fri Jul 27, 2012 1:36 pm

We are sending you RMA these days.
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Tue Jul 31, 2012 6:32 pm

We just had 4 more RB411GL failures with same type of erratic behavior. Sometimes they work fine, sometimes they randomly reboot, sometimes you can connect, sometimes not.

All came to us with 5.11 OS. We upgraded to 5.18 and 2.39 bootloader. Still issues.

All boards have the same Samsung NAND and DTC RAM. As stated repeatedly before, this combination in particular seems to be faulty. We try to reject any boards coming in with this combination of NAND/RAM but missed these four.

So far any software fixes from Mikrotik have not corrected issues. We are now losing significant credibility with our customers and we cannot trust these boards to perform.

Mikrotik, you have to do something here. This problem is damaging your reputation and ours. Get to your manufacturing facility and immediately replace Samsung NAND and DTC RAM with parts that work. Also you should recall any boards with this combination of parts as defective. This problem has gone on for months now without resolution. :(
 
ToMikaa87
newbie
Posts: 40
Joined: Mon Apr 25, 2011 8:36 pm

Re: hardware issues with rb4xx

Thu Aug 23, 2012 6:17 pm

In addition, many RBs that have the Atheros 8316 switch (like RB450G, RB493G etc.) suffer from ethernet interface failures under heavy load that makes these boards more unreliable. And definitely this issue cannot be fixed by software updates.
 
gsloop
Member Candidate
Member Candidate
Posts: 213
Joined: Wed Jan 04, 2012 11:34 pm
Contact:

Re: hardware issues with rb4xx

Fri Sep 14, 2012 4:31 am

In addition, many RBs that have the Atheros 8316 switch (like RB450G, RB493G etc.) suffer from ethernet interface failures under heavy load that makes these boards more unreliable. And definitely this issue cannot be fixed by software updates.
I'm interested in more detail on the issue above, as I have numerous RB450G's in the field. Is there a thread about this somewhere?

I'm also interested in an update on @DogHead's issue.
 
neoslaw
just joined
Posts: 2
Joined: Mon Oct 15, 2012 10:06 pm
Location: Slovakia

Re: hardware issues with rb4xx

Mon Oct 15, 2012 11:26 pm

I can report the same problem with rb435G (+ one R52Hn card). LAN cable is 10m long with 24V power supply. FW was 5.11, upgraded to 5.20 and yesterday to 5.21. FW upgrade apparently doesn't help.

But bad blocks still remains at 0.1% after each reboot. And a new problem appeared (or I haven't seen it here). Sometimes after the reboot the MT is only at 100MHz frequency and is hardly reachable with winbox (CPU load reaches 100%). Need to set the value manualy (680MHz) and reboot again. RB FW is lost.

I notice during the last few days, that rebooting starts with increasing of traffic. Up to 17-20mbit everything's alright, but when the load gets heavier, I know that the reboot will come in a few minutes.

Now it has rebooted again and is on 100MHz. I didn't change it and it keeps without rebooting for about one hour. Strange...

and clients are really pissed...

any idea what to do?
 
User avatar
roc-noc.com
Forum Veteran
Forum Veteran
Posts: 874
Joined: Sun Dec 02, 2007 3:27 am
Location: Rockford, IL USA
Contact:

Re: hardware issues with rb4xx

Fri Oct 26, 2012 3:06 am

DogHead,

What country are you in? I am just wondering if this problem is localized to the US power supplies.

I can duplicate this problem with all of my RB411GL stock. It seems that the Flash is glitched anytime you remove power. Up until that time, the RouterBoards are running fine. After that, "system check-installation" shows CRC errors.

Many thanks,

Tom
 
User avatar
DogHead
Member Candidate
Member Candidate
Topic Author
Posts: 196
Joined: Thu Jan 03, 2008 9:36 pm
Location: Anywhere you want me to be

Re: hardware issues with rb4xx

Fri Oct 26, 2012 3:26 am

We are in USA, but have customers world wide. The problem with NAND first came up in the USA, but we saw this same issue in UK and France.

What we are seeing is that if the board does not fail in the first 48 hours following numerous power resets, it will not fail.

We think that the problem is a combination of bad sampling by Samsung for their NAND, poor inrush protection on the PoE, and probably other components. Our theory is that the NAND gets hot, a power surge comes in, and it kills enough of the NAND that CRC strategies for moving data to uncorrupted areas just can't cope in ROS.

Our solution has been to do 48 hour burn in testing on 100% of all of our incoming boards with hourly power cycles. We built a test fixture for this. Boards that survive are kept and the rest returned on RMA. Then, when we pull the boards from inventory for build, we will do another 24 hour burn in system test. We are going to be running through about 1000 boards in the next month or two, so we will see how well this works. It is a huge pain in the ass to do all of this quality control for Mikrotik, but we have commitments we have to meet.

We have tried looking at using JTAG to catch this, but have not had the time to figure it out. The test points are there (CPLD), but we don't have all the documentation, so it would be a lot of work.

Other things to do to reduce chance of failure include turning off all Stores (user-manager and web-proxy) so that writes are minimized. Don't write any logs out. Turn off SMB and anything else that might use the NAND file system.
 
neoslaw
just joined
Posts: 2
Joined: Mon Oct 15, 2012 10:06 pm
Location: Slovakia

Re: hardware issues with rb4xx

Tue Oct 30, 2012 9:55 pm

DogHead,

What country are you in? I am just wondering if this problem is localized to the US power supplies.

I can duplicate this problem with all of my RB411GL stock. It seems that the Flash is glitched anytime you remove power. Up until that time, the RouterBoards are running fine. After that, "system check-installation" shows CRC errors.

Many thanks,

Tom
I'm from Slovakia, in the middle of Europe. (not Slovenia :D)
 
grg
newbie
Posts: 44
Joined: Fri Aug 20, 2010 9:51 am
Location: Latvia

Re: hardware issues with rb4xx

Wed Oct 31, 2012 9:36 pm

I recently had the same problem with my RB433AH. It stopped booting and I had to reflash it using Netinstall. It did it next day after I upgraded it with R52HN wireless adapter. It was couple of years old unit and I never had any problems with it before.
 
User avatar
roc-noc.com
Forum Veteran
Forum Veteran
Posts: 874
Joined: Sun Dec 02, 2007 3:27 am
Location: Rockford, IL USA
Contact:

Re: hardware issues with rb4xx

Fri Nov 02, 2012 9:47 pm

I recently had the same problem with my RB433AH. It stopped booting and I had to reflash it using Netinstall. It did it next day after I upgraded it with R52HN wireless adapter. It was couple of years old unit and I never had any problems with it before.
That is not the same and probably just a bad cell or two on your flash that corrupted the ROS kernel. All flash based equipment can have that problem. Thankfully Mikrotik made Netinstall which addresses this.

Tom
 
jnygard
just joined
Posts: 22
Joined: Mon Oct 07, 2013 3:05 pm

Re: hardware issues with rb4xx

Thu Oct 17, 2013 9:46 am

An old thread will be lifted up but:

I have a recently bought RB411GL that stopped working, nothing seems to help. Netinstall 6.4 and 5.26 both completes successfully but on reboot all board leds lite up and stay that way until power down. Only way to bypass that is to press reset while powering up and as the leds turn off release the reset button (about 3 sec). Then the LAN port activates but the unit is still unresponsive. It does show up on my RB951-2n under /ip neighbours but the mac address is 00:00:00:00:00:00 so I'm not able to mac telnet into the unit either...

Another case of corrupted HW?

Who is online

Users browsing this forum: No registered users and 8 guests