It has been kinda hard to see a pattern in what happened, disconnects occured at very irregular intervals, suddenly everything was fine for days or even a week, and I believed my rebooting of various routerboards and even the SIAE 18GHz microwave (which all customers disconnecting are connected through) did the trick, but the problem returned.
Finally I had the chance to dig deeper into the material, and when watching my main router (CCR-1036) for a while I saw skyrocketing CPU values (about 50% vs normal 10-15%) for some seconds, causing several of my netwatches to stall. Due to prolonged keepalive timeout (30s) for the PPPoE servers the PPPoE connections were sticking, but the CPU spikes pointed me towards some kind of hostile attack.
My theory is that somehow the high CPU (100% for many cores I believe) makes the EOIP tunnels and/or PPPoE connections "stall", leading to disconnects.
The fact that my Netwatches detect "down" for perfectly running hosts, makes me believe this can be the reason.
Based on postings by Chupaka and others, and also http://adminlog.eu/mikrotik-ddos/ I have composed the following firewall rules:
Code: Select all
/ip firewall filter
add action=jump chain=input comment="Jump to detect-syn chain" \
connection-limit=200,32 connection-state=new jump-target=detect-syn \
protocol=tcp tcp-flags=syn
add action=jump chain=forward comment="SYN: Jump to detect-syn chain" \
connection-state=new in-interface=sfp-sfpplus1 jump-target=detect-syn \
protocol=tcp tcp-flags=syn
add action=tarpit chain=forward comment=\
"Tarpit new SYN connections from IPs in syn-flooders address list" \
protocol=tcp src-address-list=syn-flooders
add action=jump chain=forward comment="Jump to DDOS detection chain" \
connection-state=new jump-target=detect-ddos
add action=drop chain=forward comment=\
"DDOS: Drop new connections from/to blacklisted IPs" connection-state=new \
dst-address-list=ddosed src-address-list=ddoser
add action=return chain=detect-ddos comment="Accept connections within limit" \
dst-limit=10000,32,src-and-dst-addresses/10s
add action=add-dst-to-address-list address-list=ddosed address-list-timeout=10m \
chain=detect-ddos comment=DDOS
add action=add-src-to-address-list address-list=ddoser address-list-timeout=10m \
chain=detect-ddos comment=DDOS
add action=return chain=detect-syn dst-limit=1000,100,dst-address-and-port/10s
add action=add-src-to-address-list address-list=syn-flooders \
address-list-timeout=10m chain=detect-syn connection-state=new protocol=tcp \
tcp-flags=syn
I had to play a bit with the parameter to avoid legitimate traffic being catched, and finally I came up with a behaviour that seems to block at least some traffic.
Most of the SYN flooder IPs are 141.212.122.x which are listed as the University of Michigan, but I guess the attackers are using false src-IPs.
Both for DDOS and for SYN I had to increase the limits far beyond the examples I found, in order to not trap real traffic. For DDOS for instance, Chupaka uses a dst-limit of 50,50, while I had to move to 10000 to avoid a lot of customers to be blacklisted.
Also I have a little trouble understanding the parameters.
So I have the following questions:
1. Can anyone give me, or point me to a good explanation of the parameters for limit and dst-limit?
2. Which values do you guys use in your production routers (ISP) ? Why are mine so far off?
3. I have seen implementations of SYN flooding detection based on both "limit" and "dst-limit". Which one is the better?
4. Is my attempt to combine SYN flooding and DDOS detection based on src- and dst-address pairs a dead end?
5. Which approach is better for SYN: Tarpit or Drop?