Page 1 of 1

Winbox makes CPU go to 100%

Posted: Sun Mar 26, 2006 11:30 pm
by jdmarti1
I did a netinstall on a routerboard 532 yesterday and put it back up on the tower. Since then I have found that if I use Winbox - the CPU will go to 100% after a while and stay there. Even when I exit Winbox and use only an SSH session. I reboot the router and it returns to normal. Last night I rebooted again - and have not used Winbox since, the CPU usage has remained normal. Has anybody else seen this?

[magicwisp@MagicWISPMT2] system resource> print
uptime: 7h59m47s
version: "2.9.18"
free-memory: 15504kB
total-memory: 30436kB
model: "RouterBOARD 500"
cpu: "MIPS 4Kc V0.10"
cpu-frequency: 333MHz
cpu-load: 100
free-hdd-space: 32676kB
total-hdd-space: 61440kB


[magicwisp@MagicWISPMT2] system resource> print
uptime: 8h6s
version: "2.9.18"
free-memory: 15496kB
total-memory: 30436kB
model: "RouterBOARD 500"
cpu: "MIPS 4Kc V0.10"
cpu-frequency: 333MHz
cpu-load: 100
free-hdd-space: 32688kB
total-hdd-space: 61440kB

Today:

[magicwisp@MagicWISPMT2] system resource> print
uptime: 13h56m30s
version: "2.9.18"
free-memory: 15784kB
total-memory: 30436kB
model: "RouterBOARD 500"
cpu: "MIPS 4Kc V0.10"
cpu-frequency: 333MHz
cpu-load: 23
free-hdd-space: 32568kB
total-hdd-space: 61440kB

CPU Frequency

Posted: Mon Mar 27, 2006 3:49 am
by e2346437
I don't know if it matters but my RouterBoard500 with 2.9.18 on it has a CPU Frequency of 264. I can't imagine why yours is faster at 333, or that it would be causing the problem, but I thought I'd point it out.

Eric

Posted: Mon Mar 27, 2006 3:56 am
by jdmarti1
It's set to 333 on purpose. I overclocked it (as allowed by the board) - I found that my routers worked better when they were overclocked. They worked well last summer in the heat of Oklahoma, so I wasn't to worried about anything else. I am not sure that the overclocking would cause the overutilization - if anything else it normally keeps the CPU ticks down.

Posted: Mon Mar 27, 2006 9:52 am
by jarosoup
I upgraded a hotspot router tonight from 2.9.10 to 2.9.18 and am seeing this too. I thought this was a problem with DNS caching in 2.9.10 so I upgraded, but still seeing cpu load as soon as I logout (never had related the CPU usage to being logged out until tonight).

This is with Winbox or SSH. When logged in with either, CPU is never less than 10%...as soon as I logout, the CPU spikes - We're graphing the CPU on Cacti so we can see the usage when we log out. Also, Initial logins with SSH take a long time now, which is breaking most of our SNMP scripts that login. Rebooting does not help...only being logged in makes this go away for me.

This is on a geode 266 board with minimal config, hotspot running with 0 users at the moment. No routing, proxy, test-packages.

Posted: Mon Mar 27, 2006 11:40 am
by uldis
please send us the support output file from your router to support@mikrotik.com

Posted: Mon Mar 27, 2006 12:09 pm
by Beccara
I've seen this aswell on a few of our routers, support file has been sent and i'm awaiting a reply.

This is on X86 PC routers and not the RB tho so it could be another issue all together.

Still, REQUEST: Show what sub-systems are using the CPU, IE, Wireless: 20%, PPPoE Server 50% etc

Ticket is 2006032616000049 if it gets it looked at any quicker

Posted: Tue Mar 28, 2006 5:03 am
by jdmarti1
My routerboard did it twice - each time a reboot cleared the problem. It has not occured since - not sure what happened, but if it comes back I will send the file.

Posted: Wed Jun 28, 2006 6:55 pm
by jarosoup
I wanted to revisit this post as I am still noticing this problem on a different router now.

As for the router I originally was posting about, the problem magically went away after a number of reboots and months of running.

I've got another router with a very similar config that is starting to do the same thing. I'm noticing a pattern here as this takes time to "ramp up" to this state (months). This particular router was originally setup with 2.9.17. Then it was upgraded to 2.9.24 when this version was released. Now, I'm on 2.9.26 due to the hopes the DNS bugs are fixed. Versions and reboots don't seem to affect this problem...but at some point, it just goes away.

Here's how both of these routers have started to do this:

- Only after having an uptime of >month or so does this even begin.

- Once this starts, we notice random CPU spikes, but they are not that noticable, except via our SNMP monitoring which starts to show missing data randomly in the graphs once or twice a day.

- Once we start to notice a few random spikes, within another month or so, they start to increase. At this point, we also notice than when winboxed into the router, opening a terminal window can take some time, and the cpu usage hits 100% during this delay. We start to get more and more misses on our SNMP graphs as we've got scripts that login to these routers every 5 minutes - these start to timeout as the login process takes 5-10 seconds which is too long, resulting in no data.

- After 2-3 months, we notice a lot of CPU spikes, every second or two we are hitting 50% or greater even with the router completely idle. The terminal from within winbox or direct SSH connections take at least 5 seconds (they used to be much faster), but typically longer.
At this point, our SNMP graphs have more missing data than graphed data, sometimes hours of nothing. This is *not* our network connection of SNMP server causing this - the login scripts are simply timing out.

- The base memory usage after a reboot steadily increases over time. So, the router starts up with 10MB or more used than it used to with a fresh reboot. Again, this number slowly climbs over months. As does the baseline CPU usage at idle.

I'm beginning to wonder if this problem is caused by our SNMP scripts logging into these routers every 5 minutes for months on end, leaving behind some memory or something as we never see this problem with wireless access points that don't have login scripts hitting them or a hotspot package running on them. Like I said, at some point (within 4 or 5 months maybe) this problem can completely disappear, but it gets bad before this happens.

I'm sending a supout today from our current problematic router. I wanted to post here in the event anyone else has continued to notice this, or any of the previous posters in this thread have seen anything relating to this since last posting here....or if there has been a workaround discovered.

Posted: Wed Jun 28, 2006 8:31 pm
by changeip
I'm beginning to wonder if this problem is caused by our SNMP scripts logging into these routers every 5 minutes for months on end, leaving behind some memory or something as we never see this problem with wireless access points that don't have login scripts hitting them or a hotspot package running on them. Like I said, at some point (within 4 or 5 months maybe) this problem can completely disappear, but it gets bad before this happens.
That was my first thought ... I noticed when ssh'ing into the router to run a simple command it would hit 100% cpu for a short time. (whereas running from local terminal it didn't) Maybe there is a memory leak or a left over process that's not dying. Tough one to troubleshoot thats for sure : )

Posted: Wed Jun 28, 2006 8:49 pm
by jarosoup
Yes, hard to troubleshoot indeed :?

Here's some new info about my problem. I've just discovered that if I leave a winbox session connected to this router, and keep a terminal window open inside winbox, this problem goes away. If I then close winbox, reopen it and don't open a terminal window, it comes back. This has got to have something to go with terminal sessions somehow. Not sure if the supout I sent is worth anything as I generated it while opening up a terminal window from winbox.