I've had this happen too on a couple of these but it's always been in a production/hurry situation where I haven't been able to troubleshoot without rebooting. Super pain and sounds similar. Lights are on but no ssh, no GUI, doesn't pass traffic.
I have two of these in production, with one 10Gib fiber, one VLAN trunk port and about 4 VLANs each. Rock solid, with an uptime of 56 and 62 days. True, the traffic is light - but even when stress testing I didn't get problems.
Maybe some specific configuration, triggering a bug?
Or could it be the power? Some transient, some spike? Noise on the line, maybe? They run quite cool, so I don't think it would be temperature related.
I'm using firmware 2.8I have two of these in production, with one 10Gib fiber, one VLAN trunk port and about 4 VLANs each. Rock solid, with an uptime of 56 and 62 days. True, the traffic is light - but even when stress testing I didn't get problems.
Maybe some specific configuration, triggering a bug?
Or could it be the power? Some transient, some spike? Noise on the line, maybe? They run quite cool, so I don't think it would be temperature related.
What firmware version you using?
Are there any special configurations?
RSTP? ...
[edit] oops my mistake ... the unit I purchased is the CRS326 and not the CCS326.Then it's either wrong configurations or faulty units. I thought there was something wrong with the firmware. I was on 2.8, now downgraded to 2.7 just in case. I will test it and see but probably will have to wait 6-10 days.
Yes, I see this too. I don't know where it comes from, but there is only 2.8I just got one of these and under SWOS it states upgrade 2.9 available but when I upgrade and it reboots automatically the upgrade does NOT take. Tried this a few times with the exact same behavior. Not good experience.Then it's either wrong configurations or faulty units. I thought there was something wrong with the firmware. I was on 2.8, now downgraded to 2.7 just in case. I will test it and see but probably will have to wait 6-10 days.
It could be a defective device. I have two CSS326. Only one of them has this hangs until now. The well functioning switch has uptime ~38 days now. Could be also because it has much less traffic than the "defective" one.Then it's either wrong configurations or faulty units. I thought there was something wrong with the firmware. I was on 2.8, now downgraded to 2.7 just in case. I will test it and see but probably will have to wait 6-10 days.
Then it's either wrong configurations or faulty units. I thought there was something wrong with the firmware. I was on 2.8, now downgraded to 2.7 just in case. I will test it and see but probably will have to wait 6-10 days.
Did you receive 2.9 RC7 ?
That's about right. I have two of them - both running about 61 C.Hi Everyone,
I just purchased a CSS326-24G-2S+RM to evaluate. I went ahead and upgraded to the 2.10 firmware. I'm going to test carefully with many GigE connections and both SFP+ ports active a 10GigE. One thing I've noticed right away: under the "System" tab, "Health" section, the temperature shows about 60C. I took the top cover off and confirmed that the switch chip (with smaller heatsink) is HOT! What temperature are you all seeing on your switches?
Wow, it seems I'm not alone. My problem though is a little bit specific. There is no problem with wired clients, but if I connect access point (RB951) to it, symptoms are like yours. Ping works, web browsing in general works, but big files stuck, sometimes after some minutes, sometimes in the beginning. In the same time wired clients attached to AP ethernet ports working fine. Quite mysterious. Switch configured as plain dumb switch, only password, static IP and identity assigned. Same AP connected to another switch or router directly is OK.This just happened to my CSS326-24G-2S+ running 2.10. It started balking after 17 days of uptime. Pings were fine, but any serious traffic would hang after a packet or two.
Disabled flow control for all ports, nothing changes. No errors.For now, try to disable the Flow Control for all interfaces under the "Link" menu in SwOS. Also, try to verify that other devices connected to the switch are not using any Flow Control settings. Keep an eye for any counters on the "Errors" menu. Let us know whether the switch still fails after this.
We are working on the issue, but because of the rare appearance, it gets extremely difficult to reproduce the problem. It might be related to Flow Control or switch congestion controls. For now, try to disable the Flow Control for all interfaces under the "Link" menu in SwOS. Also, try to verify that other devices connected to the switch are not using any Flow Control settings. Keep an eye for any counters on the "Errors" menu. Let us know whether the switch still fails after this.
I am with the very same problem.My CRS328 is still passing packets. Powering devices. But it is gone from the DHCP server. Doesn't show up in a network scan. But winbox finds it at 192.168.88.1. Webpage is not reachable at 192.168.88.1
Anything I can do to get diagnostic info to Mikrotik?
in your case, you can switch to RouterOS and forget about the problem as a bad dream)Hello,
Just to say that another CRS328-24P-4S+ get the illness too...
I installed the RouterOS 6.44.6 and since then 0 issues.in your case, you can switch to RouterOS and forget about the problem as a bad dream)Hello,
Just to say that another CRS328-24P-4S+ get the illness too...
I was try this, no effect, switch hang too. On all hanged switch in my network was port with 10Mbps connection or 100Mbps with half duplex, may by this go to trouble.Does anyone that is having this issue have Flow Control turned off on all ports?
#!/bin/sh
HOSTS="10.29.2.12 10.29.2.13 10.29.2.14 10.29.2.21 10.29.2.22 10.29.2.23"
function log {
if [ -e "/tmp/wdebug" ];then
echo -e $1 >> /tmp/wdebug.txt
fi
}
log "watch service started..."
while true
do
for h1 in $(echo $HOSTS); do
log "ping $h1 ..."
ping -c1 -s 1300 $h1 >/dev/null
if [ $? -eq 0 ];then
log "$h1 is alive..."
else
ping -c1 -s 50 $h1 >/dev/null
if [ $? -eq 0 ];then
log "$h1 has BUG needs reboot..."
wget --post-data="" --http-user=admin --http-password="secret" http://$h1/reboot
else
log "switch $h1 unavailable..."
fi
fi
done
sleep 10
done
exit 0
$ wget --post-data="" --http-user=admin --http-password="secret" http://192.168.88.1/reboot
--2020-05-08 18:26:14-- http://192.168.88.1/reboot
Connecting to 192.168.88.1:80... connected.
HTTP request sent, awaiting response... 501 Not Implemented
2020-05-08 18:26:14 ERROR 501: Not Implemented.
Yes, running RouterOS. Initial post amended to include versions of firmware and rOS. The device is in a T3 datacenter with power and cooling protection. It's operating at a constant 24.6V and 29C. CPU fluctuates between 15% and 35% but never any higher.
Some notes on the first failure:
- The switch was receiving it's management IP from a DHCP server (also rOS) on the network. The first sign of trouble was that this stopped working and the switch lost its IP. (I know, awful practice, fixed now!)
- Could not log in to the router using normal nor MAC telnet, yet it responded to pings
- Switch failed to pass any traffic through it, but still responded to pings and returned "Login:" via MAC telnet, so was partially responsive
- Could log in to the router using a console cable, but could not see any faults in the log. Statically assigned an IP, still wouldn't allow winbox access. Tried a few other things and while it accepted the commands, nothing changed it's failed state
- Power cycle via console command fixed the issue
A week later, the same thing happened again in that no traffic could pass through the device. However this time I was able to log in via winbox and reboot the switch. This fixed the problem immediately.
I now have the switch under strict monitoring with debug logging to file. If it happens again I will post my findings here.
SW01-2020may10.rsc
What has worked for me is downgrading the switch to version 2.2. Running stable for a week and no issues so far...
https://download2.mikrotik.com/swos2/cs ... 26-2.2.bin
2.11 was problematic for me. Switches would lock up daily, but as mentioned earlier in this thread they'd still pass some traffic which meant that pings would go through and if you have a ping monitor it won't always detect an issue. Downgraded to 2.2 and no issues yet. As I mentioned previously I ran these switches for over a year without any issues and the issues start when I did a firmware update.I have had issues with the switch before, but still I attempted to put it in service again since I suspected the previous problems were related to high temp. But now, even when in a cool environment, it starts to drop sessions randomly... Running 2.11, will try to downgrade to 2.7...
But, is this a crappy product or crappy SW?
There are reports of this problem with V2.10, mine included. V2.11 was our hope for a solution, but...Yup. I reverted to 2.7, which was the oldest version I had available. netpro25 reverted to 2.2, for some reason. Basically, I don´t know which SWOS-version is stable on CSS326. Maybe even 2.10 would be OK? For now, 2.7 is stable for me...
I'm wanting to know this too as I can return the CSS and pay the extra for CRS features I don't need if it means it is reliable..So, would it be a better and more stable solution if buying a new Switch to go for a CRS326 instead?
Changing vendors would be "more stable".I meant: would a CRS326 running RouterOS be a more stable solution?
Okay, exact same thing happened on 2.11 today. It's a real pain because, since ICMP traffic passes okay, I keep assuming it's a firewall issue when in fact the switch "just" needs a reboot. Is there a known safe firmware version that is newer than 2.2 and/or any possibility for Mikrotik to fix this issue?I also have a CSS326-24G-2S+ and ran into this issue tonight. I was running 2.10 when I encountered the issue but have upgraded to 2.11 now.
The switch was passing DHCP traffic sometimes (my computer would get an IP address, but not reliably). ICMP traffic seemed flawless. Nothing TCP-based that I tried worked (HTTPS, SSH, etc.).
I've kept downgrading our CSS326 from 2.11 to 2.10 to 2.9 and so on, the issues stopped appearing on 2.2. Has been running for over two months without a problem.Did the 2.2 downgrade resolve this issue? I have two of these switches (CSS326-24G-2S+) in a data center and they are having this exact issue. The funny thing is they were running for about 1 year without an issue and just started having issues recently.
What has worked for me is downgrading the switch to version 2.2. Running stable for a week and no issues so far...
https://download2.mikrotik.com/swos2/cs ... 26-2.2.bin
I can also knock my CSS326-24G-2S+RM offline by playing around with my Unifi gear !For some strange reason, I can easily reproduce the problem by adopting and upgrading Unifi APs. Yesterday I had to adopt 6 to my controller and the switch hanged on almost all of them.
This is quite interesting. I've never used Ubiquiti, so I have no idea what "adopt a device" is. Some kind of discovery, to management purposes? That would explain why my two CSS326 didn't get this problem.@Nom
Thanks for sharing.
Its not just updating the Unifi gear that knocks the switch but Adopting, I can reproduce this nearly every time I adopt a unified device. Its a shame if this problem does not get fixed as I bought this switch to try out, really hoped to start using Mikrotik switches.
I have two of them (SWoS 2.11) connected to a CRS328 (RouterOS), through 10GiB fiber.Both switches are CSS326-24G-2S+RM with 2.11 firmware. A few months ago I enabled IPv6 in one of the routers and almost immediately, both switches locked up. This was documented in detail in my post on 6 April 2020.
My two CSS326-24G-2S+RM are working fine on 2.12 - so far.Anyone running the new firmware notice any issues? How's it working?
P.SAs far as I know, you are the first person to report this lockup under ROS. This entire thread has been about SwOS.
I ran into that issue twice.
CRS326-24G-2S+
Current Firmware: ROS v6.44.5
Please start a new thread guys - this one is about CSS326-24G-2S+RM which is a SWOS-only device.No he is not first person...use search and you will find my posts also in this thread about the same problem. Our problem was solved with newer ROS.
The bug that existed for several versions would allow pings and other small packets, but anything over a couple hundred bytes (as I recall) would not work. That has been solved with either 2.12 or 2.13 (don't remember which off hand). One of my CSS326 switches that exhibited the bug has an uptime now of 405 days...Had this problem several times with 2.11 I think. But actually again on 2.13. The switch is not pingable. But it is seen in IP Neighbors. Also there is some little traffic going through it. If the switch could by somehow auto powercycled by disfunction would be wonderful. 24 students without internet is not good reputation and brings a lot of disatifaction. MIKROTIK you can make it better! Waiting for BUGFREE 2.14! We believe in you guys! : )
How do I do that on SwOS? I know the ports that those units are on, so it would be easy, they also have their own vlan, but how do I do that?@jfreak53: if you can pinpoint the problem to certain packet contents, then you'd make MT (and humanity) a favour if you could sniff off those frames and send MT the capture file. IMO this is the only way allowing MT to actually fix it. Unless they see those packets and analyze which combination of bits actually upsets switch chip they can not fix the bug.