I'm going to give this a try, thanks a lot.Yes, we have downgraded to 1000Mhz and we not had more unexpected reboot
Same symptom as I reported here. Check PSU.Yes, we have downgraded to 1000Mhz and we not had more unexpected reboot
Hi berlo, in the same way when we knew we had it in 1200Mhz of CPU we suffered much more frequent reboots but now we have two 1072 to 1000Mhz of CPU and suffers the reboots only one of them, it is the router that has less workload !!!. So it is a contradiction!hi,
at the moment we have 21 ccr1072 with 6.41rc44 all up with 17 days without issue. We do bgp + filtering + ospf. Nothing else. Try to upgrade to this release, if you still experience reboot you can exclude these service as reboot cause.
We experienced reboots with cpu upgraded to 1200Mhz, at 1000mhz never experienced it.
have you tried disabling whatdog and keep serial console opened?
You should see the error
It is still working without any issue. What is the uptime of ccr1072 now.hi,
at the moment we have 21 ccr1072 with 6.41rc44 all up with 17 days without issue. We do bgp + filtering + ospf. Nothing else. Try to upgrade to this release, if you still experience reboot you can exclude these service as reboot cause.
We experienced reboots with cpu upgraded to 1200Mhz, at 1000mhz never experienced it.
nov/29/2017 10:35:40 system,error,critical router was rebooted without proper shutdown by watchdog timer
nov/28/2017 10:33:45 system,error,critical router was rebooted without proper shutdown, probably kernel failure
nov/28/2017 10:33:45 system,error,critical kernel failure in previous boot
HiWe have several CCR1072s in our core and in the last 2 days we have had 2 watchdog reboots. one with the router on 6.38.5 and the other with the router on 6.39.2. What could be causing it and what should I do to prevent it in the future.
If I am reading this right, is fastpath only reducing the CPU by 10%?Yes and now ccr was raised to 28 in all Europe. All are working fine and we never experienced more random reboots. Also we experienced better performance on routes with > 1kk routes installed disabled route cache. You loose some % CPU, about 10% more, but you will not experiencing packetloss/stop forwarding when router will forward > 2mil pps
hi @berio,No, is on the fly. All changes can be done without reboot. The only issues are dummy rules that are not removed automatically, but need to reboot it to reactivate fast path
Hi berio,have you tried disabling whatdog and keep serial console opened?
You should see the error
hi Berlo,hi,
route cache should not cause reboot, but stop on packet forwarding. If your device have enough performance to forward traffic in slow path you can try disable the cache. You will see CPU usage increasing.
If you got kernel panic you need to hard reboot the router, so you need a managed pdu or someone that power cycle it.
well is it solved now? by disconnecting 1 psu?I was also having my CCR 1072 rebooting, but mine was "without proper shutdown" instead of "by watchdog".
Sent a e-mail to support@mikrotik and he said he thinks the hardware is faulty..
But then another CCR 1072 I have, about 300km away, showed the same problem.
My solution for both was disconecting one power supply.
Before unpluging, I noticed under system health that while it was reading PSU1 and 2 voltage of 12.1v, only one was outputing Current.
Still can't test the supposedly bad PSU I removed cause I don't have a spare 1072 atm.
I have a third 1072 working fine where both PSU share the outputed current.
right now i'am using 6.41.2, i've tried all the firmware from 6.41.2 and mikrotik gave me also 3.42.5 and 6.99.Upgrade to v6.41.2
Then upgrade bootloader to v6.41.2
/system routerboard upgrade
And reboot twice.
This should fix kernel crashes that previously was thought as hardware failure. Also it could fix other abnormal router behavior.
have you upgraded your hardware routerboard, with last RouterOs version?Could you find a fix? After 20 days my CCR rebooted out of the blue again...
So it becoming stable after removing only 1 PSU was just coincidence. Put the PSU back in place, better to have it rebooting than risking it stopping.
/system routerboard upgrade
Yup, I did.have you upgraded your hardware routerboard, with last RouterOs version?Could you find a fix? After 20 days my CCR rebooted out of the blue again...
So it becoming stable after removing only 1 PSU was just coincidence. Put the PSU back in place, better to have it rebooting than risking it stopping.
(not just router OS)
Code: Select all/system routerboard upgrade
Unless you're in real need. I have three, one each city, the reboots aren't awfully frequent but they happen.We will deploy a CCR1072 soon on our NOC.
Does all CCR1072 s have the same problem ?
Should we postpone the deployment ?
Disable route cache, it fixed for me.The one I have that doesn't reboot have a bug with the ping tool, sometimes it stops working at all for some hours.
Unfortunately they are not! Neither in ROS 6.42.2 nor in any other ROS version!!!Hey, guys
After upgrading CCR1072 and CRS317-1G-16+ to version 6.42.2, the kernel crashes have stopped. I hope they have really solved this great problem.
Best Regards,
Vagner Felipe Becker
No, device rebooted again with kernel error fail.Hey, guys
After upgrading CCR1072 and CRS317-1G-16+ to version 6.42.2, the kernel crashes have stopped. I hope they have really solved this great problem.
Best Regards,
Vagner Felipe Becker
That is probably Conntrack clearing the connections table after the PPPOE disconnections. it can be very disruptive indeed.I use the 6.38.3 without problems, except high cpu for pppoe disconnections (no nat, no masquerade, no reason...)
Precisely I don't update for fear of what I read is happening...
I 2nd this, otherwise I'm just going to downgrade to 6.38.7 and firewall off the security issues.Can we get a Mikrotik response please. The 1072 is ideal with the redundant PSU's but I cannot be rebooting it once or more a week.
At the very least like Doush said, give us a v6.38.xx that is patched. specifically for the CCR1072.
No problem here.uptime: 13w2d23h16m46s
version: 6.43 (stable)
build-time: Sep/06/2018 12:44:56
free-memory: 14.8GiB
total-memory: 15.8GiB
cpu: tilegx
cpu-count: 72
cpu-frequency: 1000MHz
cpu-load: 2%
free-hdd-space: 76.5MiB
total-hdd-space: 128.0MiB
architecture-name: tile
board-name: CCR1072-1G-8S+
platform: MikroTik
Did you end up doing this and has it restarted since? Im about to do this. I had upgraded to 6.43.4 and it was fine for about 28 days, then it restarted twice in the space of 2 days. I simply cant turn off Watchdog to get the required log details - the data centre this is in is a 30mins drive to get there, and our customers wouldnt be happy at all with that amount of downtime.I 2nd this, otherwise I'm just going to downgrade to 6.38.7 and firewall off the security issues.
So....you went from a $2,000 USD router to a $35,000 USD router?SOLVED
We get after a lot of work and trouble solving the problem of CCR1072 no longer restart or freeze.
Solved the problem was relatively easy, after more than 3 months waiting for the mikrotik team to position on the problem, solved.
We traded the CCR1072 for an MX-80 Juniper.
A MX80 is not a $35000 router! Where the hell do you buy your gear from ?So....you went from a $2,000 USD router to a $35,000 USD router?SOLVED
We get after a lot of work and trouble solving the problem of CCR1072 no longer restart or freeze.
Solved the problem was relatively easy, after more than 3 months waiting for the mikrotik team to position on the problem, solved.
We traded the CCR1072 for an MX-80 Juniper.
I have just removed all NAT and Mangle rules, which means that CT is actually not operating on the device at all. its behind a Transparent Fortigate Firewall, so shouldnt be an issue anyway. I have moved NAT and PAT services to a CISCO 3925 router now, and havent yet seen a reboot. I wondered if the CT turned off would have actually just been the answer rather than having to install another router for this purpose of NAT / PAT. A bit annoying. Do you think if I just turned off CT it would have been ok? How have you found things have been since you did this?I've just completed removing any connection tracking from ours. We had some DDoS issues and from what I've read connection tracking can be a big problem in that case. So I've rearranged my firewall rules and adapted things to run in the raw table as much as possible and turned off connection tracking. So far things seem good, but no real test to speak of yet. We really didn't need connection tracking on this router it was just not something I really considered much until the DDoS stuff started.
Where are you buying them from. Distributor list price all show $19,000+ USD. I have a friend who works for a larger ISP and they can get them for around $6,800 directly from Juniper.A MX80 is not a $35000 router! Where the hell do you buy your gear from ?So....you went from a $2,000 USD router to a $35,000 USD router?SOLVED
We get after a lot of work and trouble solving the problem of CCR1072 no longer restart or freeze.
Solved the problem was relatively easy, after more than 3 months waiting for the mikrotik team to position on the problem, solved.
We traded the CCR1072 for an MX-80 Juniper.
Looks like I'm having a similar issue...Unfortunately they are not! Neither in ROS 6.42.2 nor in any other ROS version!!!Hey, guys
After upgrading CCR1072 and CRS317-1G-16+ to version 6.42.2, the kernel crashes have stopped. I hope they have really solved this great problem.
Best Regards,
Vagner Felipe Becker
CCR1072 is not capable to handle 1Gbps bidirectional IMIX traffic and 25k sessions with conection-tracking enabled, due to watchdog timer reboots!!!
Five to ten times cheaper devices like CCR1009 and RB1100AHx4 are working fine in same test conditions!
The worst thing is that they are continuing selling this faulty device!
Contacting Mikrotik support is a lost of time!
We’ve just downgraded to latest long term, I’ll update in 48hrs.can anyone confirm that version 6.44.5 is stable with the CCR1072?
can anyone confirm that version 6.44.5 is stable with the CCR1072?
One to multiple times in a 24HR window.What reboot frequency did you have previously?
Before that, I suggest that you upgrade to the latest "stable" version (if there is an actual bug, then it might be already fixed). For example, v6.45.6 fixes a Watchdog reboot caused by h323 firewall helper. If your router did process voice call traffic, then the issue might be already resolved.
Can u record a video for this. I think if u do this Mikrotik will relate the topic.I wrote it before. It is clear:
Reboot min requirements:
- connection-tracking activated
- clear routing between two interfaces
- 1G bidirectional traffic, 600B packets (each interface handles 1G in each direction)
If the packets are bigger (1500B):
- ~2,5G bidirectional traffic
If the traffic is mostly unidirectional:
- ~2G unidirectional traffic, 600B packets
- ~5G unidirectional traffic, 1500B packets
Everyone who has a spare CCR device could do the test by using traffic generator.
Please refer to #103Can u record a video for this. I think if u do this Mikrotik will relate the topic.
I wouldn't say that. we have a lot of clients that use it successfully and when it first came out, we were able to sustain 80 Gbps of iperf traffic without issue.
The only solution is to NOT BUY this model is a scam.
+++So, could someone who has a working CCR1072 tell me what is wrong with the configuration on the video:
https://youtu.be/TAWQRaplnsM
It is not normal that a € 3000 router cannot have connection tacking enabled, right?disable connection tracking
Don't go crazy, it's not you, it's the router that is a scam.So, could someone who has a working CCR1072 tell me what is wrong with the configuration on the video:
https://youtu.be/TAWQRaplnsM
So, connection-tracking ON is wrong configuration?disable connection tracking
Absolutely the same with 6.38.3. I tested again. There is no working ROS version!Go back to 6.38.3 and all will works fine... (close service ports for avoid vulnerabilities)
Fast track just bypasses conn-track. It is clear that without conn-track the device performs much better.please show the firewall section.
did you enable FAST TRACK ?
i use with this settings and uptime is 39 days.I changed connection tracking times. And now uptime 15 days with 6.46.1
enabled: auto
tcp-syn-sent-timeout: 2s
tcp-syn-received-timeout: 2s
tcp-established-timeout: 20s
tcp-fin-wait-timeout: 5s
tcp-close-wait-timeout: 5s
tcp-last-ack-timeout: 5s
tcp-time-wait-timeout: 5s
tcp-close-timeout: 5s
tcp-max-retrans-timeout: 1m
tcp-unacked-timeout: 1m
loose-tcp-tracking: yes
udp-timeout: 5s
udp-stream-timeout: 1m
icmp-timeout: 3s
generic-timeout: 1m
max-entries: 1048576
We use it, it works very well. Before these settings, ccr1072 will restart every one or two daysFor those of you who have disabled connection tracking or tweaked the connection tracking settings
Was it successful?
Welcome to the club!Just FYI,
Brand new one we just turned up.. Same issues as all 4 before it.. Watchdog reboots..
CG-NAT/firewall/10+gb traffic = reboot..
Sent a supout.. probably get the same answer as everyone else..
R
Welcome to the club!Just FYI,
Brand new one we just turned up.. Same issues as all 4 before it.. Watchdog reboots..
CG-NAT/firewall/10+gb traffic = reboot..
Sent a supout.. probably get the same answer as everyone else..
R
Better don't waste your time contacting the support. I think that your only chance is to try ROS 7, if it works for you. If ROS 7 doesn't work for you, you have to change the router.
Check post #127CCR1072 -> Doing a total of 5G throughput with a lot of connections and 1 DHCP Server + IPV6 + DNS Server + SNMP + 33 Vlans + NAT (106 rules) + Firewall (46 rules) + Raw firewall (66 rules) + Routing (2776 routes) + 7 BGP peers and 2 instances + OSPF + Watchdog enabled - no queuing, no discovery, no cloud. Only a few tunnels about 10 L2TP with PPPoE. Bridge filtering is active.
Never had an issue with stable version 6.45.9. Now on 6.46.6 with 122days uptime again no issues. CPU is at 1000mhz.
Check post #127CCR1072 -> Doing a total of 5G throughput with a lot of connections and 1 DHCP Server + IPV6 + DNS Server + SNMP + 33 Vlans + NAT (106 rules) + Firewall (46 rules) + Raw firewall (66 rules) + Routing (2776 routes) + 7 BGP peers and 2 instances + OSPF + Watchdog enabled - no queuing, no discovery, no cloud. Only a few tunnels about 10 L2TP with PPPoE. Bridge filtering is active.
Never had an issue with stable version 6.45.9. Now on 6.46.6 with 122days uptime again no issues. CPU is at 1000mhz.
Probably your traffic consist mostly of big packets and it is distributed over more than two interfaces.
In my opinion, activating more features (some of them) in fact improves device condition, because the traffic is distributed over more cores. For some Mikrotik devices, I have seen better test results accomplished in NAT mode then in just routing.
Good luck!
That is interesting. You may have some new hardware version. Could you check your revision (/system routerboard print).Traffic exists on a single 10G SFP port with a VLAN tag as well on top of this.
Traffic consists mostly of small packets - TX/RX 65-127 is actually the largest number out of all of them.
The cause of the watchdog reboots its probably some specific features/items indeed.
Check post #127CCR1072 -> Doing a total of 5G throughput with a lot of connections and 1 DHCP Server + IPV6 + DNS Server + SNMP + 33 Vlans + NAT (106 rules) + Firewall (46 rules) + Raw firewall (66 rules) + Routing (2776 routes) + 7 BGP peers and 2 instances + OSPF + Watchdog enabled - no queuing, no discovery, no cloud. Only a few tunnels about 10 L2TP with PPPoE. Bridge filtering is active.
Never had an issue with stable version 6.45.9. Now on 6.46.6 with 122days uptime again no issues. CPU is at 1000mhz.
Probably your traffic consist mostly of big packets and it is distributed over more than two interfaces.
In my opinion, activating more features (some of them) in fact improves device condition, because the traffic is distributed over more cores. For some Mikrotik devices, I have seen better test results accomplished in NAT mode then in just routing.
Good luck!
Hi guys!That is interesting. You may have some new hardware version. Could you check your revision (/system routerboard print).Traffic exists on a single 10G SFP port with a VLAN tag as well on top of this.
Traffic consists mostly of small packets - TX/RX 65-127 is actually the largest number out of all of them.
The cause of the watchdog reboots its probably some specific features/items indeed.
Check post #127CCR1072 -> Doing a total of 5G throughput with a lot of connections and 1 DHCP Server + IPV6 + DNS Server + SNMP + 33 Vlans + NAT (106 rules) + Firewall (46 rules) + Raw firewall (66 rules) + Routing (2776 routes) + 7 BGP peers and 2 instances + OSPF + Watchdog enabled - no queuing, no discovery, no cloud. Only a few tunnels about 10 L2TP with PPPoE. Bridge filtering is active.
Never had an issue with stable version 6.45.9. Now on 6.46.6 with 122days uptime again no issues. CPU is at 1000mhz.
Probably your traffic consist mostly of big packets and it is distributed over more than two interfaces.
In my opinion, activating more features (some of them) in fact improves device condition, because the traffic is distributed over more cores. For some Mikrotik devices, I have seen better test results accomplished in NAT mode then in just routing.
Good luck!
It is not about specific features, just connection-tracking. See the video below.
https://www.youtube.com/watch?v=J5arAJnI62I
You should open a ticket and send them the supout.inf file generated automatically after watchdog rebooted your router. I don't think you will get any help from this topic, only guesses.Guys, i experienced the WATCHDOG reboot on CCR1072
1. Using only for CGNAT ( 514 nat rules entries for 65000 connections using netmap) and PBR
2. Not using any routing protocols.
It is running on v6.46.7
Its run successfully for 4 days carrying 5.2Gbps with 48% cpu load,
All of sudden rebooted by watchdog, eventhough the connection tracking entries are at around 900000.
I would like to know the impact of setting tcp-establistished timeout=15m
tik guys kindly responds to this forum, as it has been active for some years expecting the answer
unfortunately support file is not created after the rebootYou should open a ticket and send them the supout.inf file generated automatically after watchdog rebooted your router. I don't think you will get any help from this topic, only guesses.Guys, i experienced the WATCHDOG reboot on CCR1072
1. Using only for CGNAT ( 514 nat rules entries for 65000 connections using netmap) and PBR
2. Not using any routing protocols.
It is running on v6.46.7
Its run successfully for 4 days carrying 5.2Gbps with 48% cpu load,
All of sudden rebooted by watchdog, eventhough the connection tracking entries are at around 900000.
I would like to know the impact of setting tcp-establistished timeout=15m
tik guys kindly responds to this forum, as it has been active for some years expecting the answer
Do you have "Automatic Supout" enabled in System > Watchdog menu? If you do have that enabled and it still not generate a file, you can do it via System > Scheduler by adding new script which should be executed at boot.ngw01.JPG
Did you have this reboots before? Is it possible that they started after you have done a firmware upgrade and everything was ok with the previous firmware?ngw01.JPG
Wow, this is insane. You probably won't be able to do that because your router will be hanging. Awesome support. 5 stars.no errors on the supout.rif, since it is considered as a normal boot, supout.rif i need to create while it is freezed, i need to disable the watchdog for that.
Guys please advice me on the protection for CGNAT enabled Router from DDOS etc
They just can't fix it and they don't want to admit that they sell ´buggy´ equipment. I guess that their sales are going ok, and it is not happening to everybody, so they just can ignore it. We replaced one of the CCR1072 with a Juniper MX and now the CCR is serving a small office doing some dhcp, nat, firewall and other things typical for a very small office, and believe me it is working perfectly (showing 0% load). Well, you will think something like, WHAT!!?? 3k router for 15 people? And the answer is yes, because the other option was using it to replace a broken leg of a storage shelf.Isnt it weird that MT is completely silent about this issue ? :)
Well
I have 1036 with about 5Gig connections, and we do conntrack + fasttrack and we are about at 20% at peak time.
Are you sure that fasttrack is enabled correctly ?
Can you print your config with hide-sensitive? Or you can send privately in a private message?
My skype id: abdulrazaq.a@hotmail.com
I am about to buy a 1072 and I want to be sure that everything is fine.
Try disabling connection tracking.Hello, good afternoon, I have a CCR1072 router working a couple of months ago, these last days it has started to restart every 3 or 4 days, the router is being used as a BGP edge router, the cpu usage is between 7 to 10% Does anyone know how to fix it or the equipment is bad ?
"router was rebooted without proper shutdown by watchdog timer"
Kind regards from Chile.
and enable:IP > Cloud > Update Time
Since then, I pass from 1 or 2 watchdog reboots a day to +6 days an counting.System > SNTP Client
I love this partRe unwanted WatchDog reboots.
I do not use the ROS WatchDog settings/function.
Instead, I use the NetWatch feature to trigger a script if/when a NetWatch ping fails.
The script then enters a count-down loop which is something like this:
#1 - Log date/time & message & Loop-Count variable to logs
#2 - If ping successful - then exit ( quit the script )
#3 - Add a 1 to a Loop-Count variable
#4 - Sleep 15 seconds
#5 - If Loop-Count variable = 30 ( 7.5 minutes ) then reboot this Mikrotik
#6 - If Loop-Count variable = 20 , then do a site-survey and save scan results to file
#7 - loop ( go to ) #1
I use a NetWatch WatchDog Reboot script similar to this on over 1-thousand Mikrotiks connected to my networks.
I test-ping to a special IP address in my server room ( 192.0.2.254 ).
Note - Sometimes a remote client site-survey ( scan ) at #6 is all that is necessary to get a remote client to connect/re-connect if the client did not receive a DHCP IP address.
Working with Mikrotik is a combination of reboots and software updates. Stability is just a dream.reboot this Mikrotik
I absolutely agree with you.We had the same problem on our BGP 1072, just turn off connection tracking and you wont experience any more reboots, its due to hardware limitations not one mikrotik router can smoothly handle 2.5Gbps of bidirectional traffic at once, the connection tracking causes a spike in the CPU performance then crashes.
if you dont have NATTing or so, just turn off the connection tracking, ip > firewall > connection tracking > set enabled=no
We had the same problem on our BGP 1072, just turn off connection tracking and you wont experience any more reboots, its due to hardware limitations not one mikrotik router can smoothly handle 2.5Gbps of bidirectional traffic at once, the connection tracking causes a spike in the CPU performance then crashes.
if you dont have NATTing or so, just turn off the connection tracking, ip > firewall > connection tracking > set enabled=no
And me too.I absolutely agree with you.We had the same problem on our BGP 1072, just turn off connection tracking and you wont experience any more reboots, its due to hardware limitations not one mikrotik router can smoothly handle 2.5Gbps of bidirectional traffic at once, the connection tracking causes a spike in the CPU performance then crashes.
if you dont have NATTing or so, just turn off the connection tracking, ip > firewall > connection tracking > set enabled=no
add action=notrack chain=prerouting comment="Notrack http in" dst-address-list=ConntrackNo dst-port=80,443 protocol=tcp
add action=notrack chain=prerouting comment="Notrack http out" protocol=tcp src-address-list=ConntrackNo src-port=80,443
Reboot min requirements:
- connection-tracking activated
- clear routing between two interfaces
- 1G bidirectional traffic, 600B packets (each interface handles 1G in each direction)
If the packets are bigger (1500B):
- ~2,5G bidirectional traffic
If the traffic is mostly unidirectional:
- ~2G unidirectional traffic, 600B packets
- ~5G unidirectional traffic, 1500B packets
Everyone who has a spare CCR device could do the test by using traffic generator.
/ip firewall connection tracking set tcp-established-timeout=20s
Hello,Reboot min requirements:
- connection-tracking activated
- clear routing between two interfaces
- 1G bidirectional traffic, 600B packets (each interface handles 1G in each direction)
If the packets are bigger (1500B):
- ~2,5G bidirectional traffic
If the traffic is mostly unidirectional:
- ~2G unidirectional traffic, 600B packets
- ~5G unidirectional traffic, 1500B packets
Everyone who has a spare CCR device could do the test by using traffic generator.
@kos: Hello there!
1. Could you please share exact configuration of your traffic generator (I'm guessing you're using MikroTik /tool traffic-generator) so I can reproduce it in my lab?
2. For how long it's going to take for CCR to get rebooted? In the video I see uptime around 17 minutes. Is it an average OR it could take days before the reboot?
3. Have you tried to lower tcp-established-timeout from 1d to 20s and test it again?
Code: Select all/ip firewall connection tracking set tcp-established-timeout=20s
Thank you!
/ip firewall connection tracking
set enabled=yes
/ip address
add address=10.16.0.1/24 interface=sfp-sfpplus1 network=10.16.0.0
add address=10.15.0.1/24 interface=sfp-sfpplus2 network=10.15.0.0
/ip route
add distance=1 dst-address=10.55.0.0/16 gateway=10.15.0.2
add distance=1 dst-address=10.66.0.0/16 gateway=10.16.0.2
After a few weeks of meaningless tests, the answer was - DUT is overloaded, you can't see it but it is.
With 1G?????? That is not even close to your test results published on your web site?????!!!!
- Our tests are performed with connection-tracking disabled!
That's all! You are ******!
Hi,Hi kos,
I appreciate your prompt response.
Just to confirm:
- In your original testing you were using MikroTik /tool traffic-generator to generate the traffic or something else?
- What size of the packets you were using for "1G bidirectional traffic" test?
- Have you tried the same test against CCR1036-8G-2S+ or CCR1036-12G-4S? Was is fine or it's the same as CCR1072?
- Were you monitoring the number of connections during the test?
- Am I right that configuration that you used in your testing would look as following and you were generating UDP "1G bidirectional traffic" between 10.55.0.0 and 10.66.0.0 networks?
Code: Select all/ip firewall connection tracking set enabled=yes /ip address add address=10.16.0.1/24 interface=sfp-sfpplus1 network=10.16.0.0 add address=10.15.0.1/24 interface=sfp-sfpplus2 network=10.15.0.0 /ip route add distance=1 dst-address=10.55.0.0/16 gateway=10.15.0.2 add distance=1 dst-address=10.66.0.0/16 gateway=10.16.0.2
Here is a snippet from one of your previous messages:After a few weeks of meaningless tests, the answer was - DUT is overloaded, you can't see it but it is.
With 1G?????? That is not even close to your test results published on your web site?????!!!!
- Our tests are performed with connection-tracking disabled!
That's all! You are ******!
Well, the response is interesting especially this part "Our tests are performed with connection-tracking disabled!". If we go to official "Test results" page https://mikrotik.com/product/CCR1072-1G ... estresults we can see a section for a CCR1072 with default CPU and DRAM clock settings (CPU1000MHz, DDR1600), 64 bytes packet size and configuration with "25 ip filter rules". If I understand it correctly, "25 ip filter rules" implies that connection-tracking is enabled. Would you agree?
Test results for 64 bytes packet size are: 2,724.4 Mbps and 5,321.0 kpps which in other words 2.7Gbps half-duplex or 1.35Gbps full-duplex and 5.3 Mpps (millions packets per second). Your test was "1G bidirectional traffic, 600B packets". Could you please explain what "600B packets" mean? It's not "600 billions packets per second", it's something else. Right?
Have you got more detailed explanation rather than just "DUT is overloaded, you can't see it but it is."? What metric (number of connections, packets per second, etc) overloaded the router?
And finally, have you got anything useful from MikroTik support to mitigate the issue or any workaround?
Thank you!
Yes, I´m looking forward to hear some news! ROS 7 seems to be very promising.Someone did test if the 1072 with 7.1.1 still reboots?
we have two brand new 1072 that I am afraid to put in production...
thank you
Hello guys, we have followed this setting and don't experience again with watchdog reboot, the uptime now about 3 days with ROS 6.47.9 (long term)I changed connection tracking times. And now uptime 15 days with 6.46.1
enabled: auto
tcp-syn-sent-timeout: 2s
tcp-syn-received-timeout: 2s
tcp-established-timeout: 20s
tcp-fin-wait-timeout: 5s
tcp-close-wait-timeout: 5s
tcp-last-ack-timeout: 5s
tcp-time-wait-timeout: 5s
tcp-close-timeout: 5s
tcp-max-retrans-timeout: 1m
tcp-unacked-timeout: 1m
loose-tcp-tracking: yes
udp-timeout: 5s
udp-stream-timeout: 1m
icmp-timeout: 3s
generic-timeout: 1m
max-entries: 1048576
This issue has been the bane of my life for the last year and a half, or so. I thought I'd post here with my experience as it's always useful for others to be able to relate.
We were previously running 1036s but "upgraded" to 1072s as a future proofing measure. These were put in with identical configuration, with the exception of the following:
- ROS upgrade
- Slight change to VRRP priority
We never saw a single unexplained reboot on the 1036 platform and almost immediately after switching to the 1072s, we began to see sporadic watchdog reboots. These could be anywhere from a few hours to a few months apart, so it was really difficult to identify a cause. One trait we did identify was that these reboots would only occur during times of network use (we didn't see a single OoH reboot).
A case was opened with support, but predictably was unresolved, with suggestions being almost verbatim what others have posted here (seems like they've seen this enough to have a copy/paste response ready). Things that were suggested (and tried):
- Provide supout.
- Upgrade to the latest Long Term firmware.
- Disable watchdog reboot and see if anything is echoed to the serial console.
- Upgrade again to the now latest Long Term firmware
- Abandon the Long Term tree and install the latest Stable firmware
It was obvious they had no idea what the issue was and no in-depth investigation was going to be taking place, so pretty much gave up on getting a solution from support.
Off my own back, I tried the following:
- Rolled back firmware to the version we were previously running on the 1036s before we had experienced any reboots
- Reverted VRRP priority changes that were made when installing the 1072s
- Setup verbose logging for various firewall features.
- Minimised SNMP monitoring on device as I could see a lot of activity in the verbose logs and wasn't sure if SNMP was single threaded, leading to a single core becoming overloaded
- Various other config tweaks in the desperate attempt to find something that was causing the behaviour
Having subsequently discovered this thread I'm certain that this is a 1072 platform issue, rather than a configuration/load issue (we are running connection tracking, but also were previously). This was suspected from the beginning, but we had no facility to validate this without reverting to the 1036s which, given subsequent network expansion wasn't the preferred option.
We've purchased some 2216s as a potential "solution" to this problem, however step one is to get to ROS7. We've reluctantly upgraded to 7.5 and the plan now is to evaluate stability on the 1072s as we'd like to use them if they'll remain stable. If we see another reboot, we'll then be swapping them out for the 2216s.
Initial stability and performance seems good, however it's early days. CPU usage appears enormously improved in 7.5. You can see one day to the next, with similar traffic profiles across the days, CPU usage is hovering at around 5% vs around 30% pre-upgrade.
Screenshot 2022-09-23 185148.png
I'll update on this thread again when we have some more long-term information on stability.
I see the linesystem,error,critical router was rebooted without proper shutdown by watchdog timer
I'm assuming this is still the same issue and just a change in phrasing in the logging between the builds, but can't be sure.system,error,critical router was rebooted without proper shutdown, probably kernel failure
Yes, absolutely. They will just keep making new products and funny youtube videos )))Quick follow-up to this. After reverting back to our CCR1036s 6 months ago, the issue has subsided completely.
There's clearly a hardware issue with the CCR1072 platform, but obviously MT are never going to admit this.
+1Yes, absolutely. They will just keep making new products and funny youtube videos )))Quick follow-up to this. After reverting back to our CCR1036s 6 months ago, the issue has subsided completely.
There's clearly a hardware issue with the CCR1072 platform, but obviously MT are never going to admit this.
Yeah, you are absolutely right.+1
Yes, absolutely. They will just keep making new products and funny youtube videos )))
Why keep rolling off new products instead of fixing major flaws on past designs like the 1072??
The market share for enterprise routers have Cisco on the lead followed by Huawei , juniper just to name some...
MT was not even in the top 10....no wonder with the lack
Of dependability in the enterprise market segment.
If I was in MT R&D DEPARTMENT i will halt new products and redirect to find the HW problem with the 1072. And offer
My customers either / or....
Replace that 1072 with something else capable of the same processing or more
Or send the revised router board main board for it (1072 V5)
But never let customers hanging with no support neither a response ...meanwhile MT minions keep rolling out toy devices to compete with the home market share instead of supporting my enterprise market share client base.
I'm either case and wathever route MT will to take
They must acknowledge and offer a fair solution to the end customer.
Many angry clients gone with juniper or with something else because if the lack of support and attention.
Other downgraded to a smaller MT model with ROS 7
MT if you are reading this :
We are talking a about a 2500+ business class device
It's not a sub 200 bucks toy.
Do not take this 1072 incident lightly , this was just the check engine light in your dashboard.
Give a resolution to it.
Do not let down your customer base.
And if you ever want to be in the top 10 for the world router market share
Do not roll out half baked products
And if you non deliberately did.... Patch the flaws (like Microsoft...google...apple...)
The only thing expected from Mikrotik is that they admit and fix the problems of their plattform. It's not about capacity or traffic, it's about fixing the problems that their customers experience.its completely out of proportion...
its fundamentally wrong to compare MikroTik with The most big Vendors
its 1000x the proportional difference
if you need a bigger more expensive product just buy it, buy it with all the profits obtained from having an affordable plataform for small Networks: MikroTik
The situation is simple, some Networks hace grown faster than MikroTik have scaled so they need another kind of product
You think is easy to make a router for terabit scale, what are you waiting for ? there is plenty of open source DIY options
This isn't about the capability of the device. There's a fundamental flaw in the 1072 platform that causes it to reboot. A 1036 in exactly the same scenario performs worse, but doesn't randomly reboot. I would expect even the most basic mikrotik device to have stability in use, provided it wasn't being overutilised.its completely out of proportion...
its fundamentally wrong to compare MikroTik with The most big Vendors
its 1000x the proportional difference
if you need a bigger more expensive product just buy it, buy it with all the profits obtained from having an affordable plataform for small Networks: MikroTik
The situation is simple, some Networks hace grown faster than MikroTik have scaled so they need another kind of product
You think is easy to make a router for terabit scale, what are you waiting for ? there is plenty of open source DIY options
Yes, we exactly the same problem! We also moved from 1036 to 1072 and felt dissapointed. We ended up turning off connection tracking, but the router did randomly reboot once a year or so. Than one probably is fixed in a newer versions, but honestly, we are scared to upgrade. We upgraded our 1036 once to a newer firmware and the same afternoon it was not able to process all the trafic. We reverted back and the router performed ok. From that moment, we upgrade very carefully.It's great that you've never run into any issues, however your examples are purely anecdotal. Just because you've never experienced the issue, doesn't mean it doesn't exist.
As has been observed through this thread, the issue appears to be related to connection tracking being enabled. Perhaps you aren't using connection tracking, or pushing enough traffic through the device to observe the issue. Perhaps this doesn't affect all 1072s - No-one knows, because MT aren't being transparent about the issue.
As far as I can tell, pretty much no effort is being put in from their side to identify and resolve the problem. When I had a ticket open with them, it was just a constant loop of "Can you update ROS?", which is a complete fob-off and totally unacceptable given the number of people who have observed this issue.
To reiterate, we ran multiple 1072s which all exhibited this issue. We were originally running 1036s and only started seeing the issue after they were swapped out for 1072s. After swapping them back to 1036s, we once again don't see the issue. All using the same config.
So maybe my experience can shed some more light on it, with 3 CCR-1072's we have had 2 that worked for 2 years solid without a random reboot and/or issue. The third one experienced these watchdog timer reboots. I can confirm that we had connection tracking disabled and it was only doing about 900Mbps through the device during peak times (so not a load issue). After getting frustrated with Mikrotik support regarding this issue we eventually solved it by moving to 6Wind VSR (runs DPDK on supermicro / intel hardware) and haven't looked back since.It's great that you've never run into any issues, however your examples are purely anecdotal. Just because you've never experienced the issue, doesn't mean it doesn't exist.
As has been observed through this thread, the issue appears to be related to connection tracking being enabled. Perhaps you aren't using connection tracking, or pushing enough traffic through the device to observe the issue. Perhaps this doesn't affect all 1072s - No-one knows, because MT aren't being transparent about the issue.
As far as I can tell, pretty much no effort is being put in from their side to identify and resolve the problem. When I had a ticket open with them, it was just a constant loop of "Can you update ROS?", which is a complete fob-off and totally unacceptable given the number of people who have observed this issue.
To reiterate, we ran multiple 1072s which all exhibited this issue. We were originally running 1036s and only started seeing the issue after they were swapped out for 1072s. After swapping them back to 1036s, we once again don't see the issue. All using the same config.