SIP client cannot re-register in the SIP server after switching ISP (different NAT).
Description:
In our setup we have two ISP providers, a SIP client with a private IP, and we're using NATs (a different NAT for each ISP provider) with SIG ALG translation, aka SIP nat helper.
When changing the default route from one ISP provider to the another one (manually, or because the ISP link goes down), the Mikrotik applies the wrong NAT rule. Because of this, the SIP register messages cannot reach the SIP server and the SIP connection drops.
If we clean the NAT table or even reboot the router, everything is gonna be ok again.
Versions affected:
6.38(mibspe),6.38.5(chr),6.39.3(mibspe),6.41(chr)
Note: We have tested some real Mikrotiks (mibspe) and run some simulations in GNS3 with routerosx86 Mikrotik virtual machines (chr).
How to reproduce:
- Plug a router to two different ISPs (each one giving you a different real IP) and to an internal network;
- Create proper NAT rules for each ISP;
- Create proper default routes (static routes) for each ISP (the first ISP with the smaller distance);
- Set up a SIP client (in the internal network) to register in an external SIP server and do the register;
- Change the distance of the default routes so the second ISP will be the active route (smaller distance);
- Try to re-register the SIP client in the SIP server and you will see that no SIP message returns and the re-register fails;
- Check the NAT table and run a sniffer in the router and you will see that the router is routing the package via the second ISP but it's still applying the old NAT rule (for the first ISP) instead of the correct NAT rule.
Network setup and detailed how to reproduce:
I have a production setup somebit complicated. However, I run a much more simple setup using GNS3. So, I'm showing this simplified setup here.
The screenshot of my GNS3 setup is above:
The isp1 and isp2 nodes simulate the two different ISPs. They connect to the Internet via GNS3 NAT nodes (if you doesn't know how GNS3 works, just consider that the isp1 and isp2 nodes just behave as real ISPs routers).
Our router (router) is connected to both ISPs and also to the sip-client node (an Ubuntu 14.04 docker node that simulates a SIP client).
A more detailed diagram is showed above:
The implementation of the router node is:
/ip address
add address=10.10.1.2/24 interface=ether1 network=10.10.1.0
add address=10.10.2.2/24 interface=ether2 network=10.10.2.0
add address=192.168.0.1/24 interface=ether3 network=192.168.0.0
/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=10.10.1.2
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=10.10.2.2
/ip route
add distance=1 gateway=10.10.1.1
add distance=2 gateway=10.10.2.1
/system identity
set name=router
/ip firewall connection tracking
set icmp-timeout=1h
auto eth0
iface eth0 inet static
address 192.168.0.100
netmask 255.255.255.0
gateway 192.168.0.1
up echo nameserver 192.168.0.1 > /etc/resolv.conf
[admin@router] > /ip firewall service-port print where name=sip
Flags: X - disabled, I - invalid
# NAME PORTS
0 sip 5060
5061
[admin@router] > /ip firewall connection print detail where protocol=icmp
Flags: E - expected, S - seen-reply, A - assured, C - confirmed, D - dying,
F - fasttrack, s - srcnat, d - dstnat
0 S C s protocol=icmp src-address=192.168.0.100 dst-address=199.87.121.233
reply-src-address=199.87.121.233 reply-dst-address=10.10.1.2
icmp-type=8 icmp-code=0 icmp-id=521 timeout=58m16s orig-packets=4
orig-bytes=336 orig-fasttrack-packets=0 orig-fasttrack-bytes=0
repl-packets=3 repl-bytes=252 repl-fasttrack-packets=0
repl-fasttrack-bytes=0 orig-rate=0bps repl-rate=0bps
[admin@router] > /ip firewall connection print detail where connection-type=sip
Flags: E - expected, S - seen-reply, A - assured, C - confirmed, D - dying,
F - fasttrack, s - srcnat, d - dstnat
0 SAC s protocol=udp src-address=192.168.0.100:5060
dst-address=199.87.121.233:5060
reply-src-address=199.87.121.233:5060
reply-dst-address=10.10.1.2:5060 connection-type="sip"
timeout=58m21s orig-packets=3 orig-bytes=1 347
orig-fasttrack-packets=0 orig-fasttrack-bytes=0 repl-packets=3
repl-bytes=927 repl-fasttrack-packets=0 repl-fasttrack-bytes=0
orig-rate=0bps repl-rate=0bps
[admin@router] > /ip route print where dst-address=0.0.0.0/0
Flags: X - disabled, A - active, D - dynamic,
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
# DST-ADDRESS PREF-SRC GATEWAY DISTANCE
0 A S 0.0.0.0/0 10.10.1.1 1
1 S 0.0.0.0/0 10.10.2.1 2
[admin@router] > /ip route set [find static gateway=10.10.1.1] distance=100
[admin@router] > /ip route print where dst-address=0.0.0.0/0
Flags: X - disabled, A - active, D - dynamic,
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
# DST-ADDRESS PREF-SRC GATEWAY DISTANCE
0 A S 0.0.0.0/0 10.10.2.1 2
1 S 0.0.0.0/0 10.10.1.1 100
Now, in the client, we will ping (ICMP) the SIP server and send a SIP message to the SIP server again. We will notice that ICMP ping works, but the SIP message doesn't returns.
The reason is because the new ICMP packets adds a new NAT entry (via the second ISP) but the SIP NAT still uses the NAT via the first ISP (the router NATs the package using the IP of the ether1 although it sends the package via ether2).
[admin@router] > /ip firewall connection print detail where protocol=icmp
Flags: E - expected, S - seen-reply, A - assured, C - confirmed, D - dying,
F - fasttrack, s - srcnat, d - dstnat
0 S C s protocol=icmp src-address=192.168.0.100 dst-address=199.87.121.233
reply-src-address=199.87.121.233 reply-dst-address=10.10.1.2
icmp-type=8 icmp-code=0 icmp-id=521 timeout=49m38s orig-packets=4
orig-bytes=336 orig-fasttrack-packets=0 orig-fasttrack-bytes=0
repl-packets=3 repl-bytes=252 repl-fasttrack-packets=0
repl-fasttrack-bytes=0 orig-rate=0bps repl-rate=0bps
1 S C s protocol=icmp src-address=192.168.0.100 dst-address=199.87.121.233
reply-src-address=199.87.121.233 reply-dst-address=10.10.2.2
icmp-type=8 icmp-code=0 icmp-id=522 timeout=59m36s orig-packets=2
orig-bytes=168 orig-fasttrack-packets=0 orig-fasttrack-bytes=0
repl-packets=2 repl-bytes=168 repl-fasttrack-packets=0
repl-fasttrack-bytes=0 orig-rate=0bps repl-rate=0bps
[admin@router] > /ip firewall connection print detail where connection-type=sip
Flags: E - expected, S - seen-reply, A - assured, C - confirmed, D - dying,
F - fasttrack, s - srcnat, d - dstnat
0 SAC s protocol=udp src-address=192.168.0.100:5060
dst-address=199.87.121.233:5060
reply-src-address=199.87.121.233:5060
reply-dst-address=10.10.1.2:5060 connection-type="sip"
timeout=59m38s orig-packets=5 orig-bytes=2 245
orig-fasttrack-packets=0 orig-fasttrack-bytes=0 repl-packets=3
repl-bytes=927 repl-fasttrack-packets=0 repl-fasttrack-bytes=0
orig-rate=0bps repl-rate=0bps
[admin@router] > /tool sniffer quick port=5060
INTERFACE TIME NUM DI SRC-MAC DST-MAC VLAN
ether3 12.427 1 <- B2:C2:B4:18:98:07 00:6B:0E:68:7F:02
ether2 12.427 2 -> 00:6B:0E:68:7F:01 00:6B:0E:6B:12:01
-- [Q quit|D dump|C-z pause]
Some questions:
- It's a bug?
- Someone already saw this problem in another setup - like with another SIP helper or with normal UDP NATs?
- What is the expected behavior?
- If it's a bug, how can I inform the Mikrotik suport team about it?
Known workaround:
We're using now the following workaround:
- The router checks from time to time (via a script that runs in /system scheduler) if the default gateway have changed;
- If it discovers a change, then it runs the following command:
/ip firewall connection remove [find where connection-type=sip]
Disable SIP ALG is not an option:
Please, there's nothing wrong with using SIP ALG (as long as it is implemented without bugs). Actually, our case is exactly the case SIP ALG was created for.
Moreover, our server requires SIG ALG to call the SIP client when necessary.