SIP client cannot re-register in the SIP server after switching ISP (different NAT)
Posted: Wed Dec 27, 2017 6:49 pm
Issue:
SIP client cannot re-register in the SIP server after switching ISP (different NAT).
Description:
In our setup we have two ISP providers, a SIP client with a private IP, and we're using NATs (a different NAT for each ISP provider) with SIG ALG translation, aka SIP nat helper.
When changing the default route from one ISP provider to the another one (manually, or because the ISP link goes down), the Mikrotik applies the wrong NAT rule. Because of this, the SIP register messages cannot reach the SIP server and the SIP connection drops.
If we clean the NAT table or even reboot the router, everything is gonna be ok again.
Versions affected:
6.38(mibspe),6.38.5(chr),6.39.3(mibspe),6.41(chr)
Note: We have tested some real Mikrotiks (mibspe) and run some simulations in GNS3 with routerosx86 Mikrotik virtual machines (chr).
How to reproduce:
- Plug a router to two different ISPs (each one giving you a different real IP) and to an internal network;
- Create proper NAT rules for each ISP;
- Create proper default routes (static routes) for each ISP (the first ISP with the smaller distance);
- Set up a SIP client (in the internal network) to register in an external SIP server and do the register;
- Change the distance of the default routes so the second ISP will be the active route (smaller distance);
- Try to re-register the SIP client in the SIP server and you will see that no SIP message returns and the re-register fails;
- Check the NAT table and run a sniffer in the router and you will see that the router is routing the package via the second ISP but it's still applying the old NAT rule (for the first ISP) instead of the correct NAT rule.
Network setup and detailed how to reproduce:
I have a production setup somebit complicated. However, I run a much more simple setup using GNS3. So, I'm showing this simplified setup here.
The screenshot of my GNS3 setup is above:
The isp1 and isp2 nodes simulate the two different ISPs. They connect to the Internet via GNS3 NAT nodes (if you doesn't know how GNS3 works, just consider that the isp1 and isp2 nodes just behave as real ISPs routers).
Our router (router) is connected to both ISPs and also to the sip-client node (an Ubuntu 14.04 docker node that simulates a SIP client).
A more detailed diagram is showed above:
The implementation of the router node is:
To make NAT tests easier, we also have increased the NAT ICMP timeout:
And this is the network configuration of the SIP client (/etc/network/interfaces):
Our SIP client connects to the SIP server using NAT with the help of the SIP ALG translation, aka SIP nat helper:
Now, in the client, we will ping (ICMP) the SIP server and also send a SIP message to our SIP server. After this, we have the following entries in the firewall NAT table:
To reproduce the problem, let's change the active default route to the second ISP:
The new scenario is showed in the image below:
Now, in the client, we will ping (ICMP) the SIP server and send a SIP message to the SIP server again. We will notice that ICMP ping works, but the SIP message doesn't returns.
The reason is because the new ICMP packets adds a new NAT entry (via the second ISP) but the SIP NAT still uses the NAT via the first ISP (the router NATs the package using the IP of the ether1 although it sends the package via ether2).
I don't know if it's a bug or if this behavior really makes sense, but I guess that the Mikrotik router (when receiving the new SIP packets) should create a new NAT SIP entry with the new reply-dst-address just as it occurs with the ICMP messages (because now the packages are sent through a new interface - ether2 - that have a different NAT rule).
Some questions:
- It's a bug?
- Someone already saw this problem in another setup - like with another SIP helper or with normal UDP NATs?
- What is the expected behavior?
- If it's a bug, how can I inform the Mikrotik suport team about it?
Known workaround:
We're using now the following workaround:
- The router checks from time to time (via a script that runs in /system scheduler) if the default gateway have changed;
- If it discovers a change, then it runs the following command:
After this, all SIP connections started working again.
Disable SIP ALG is not an option:
Please, there's nothing wrong with using SIP ALG (as long as it is implemented without bugs). Actually, our case is exactly the case SIP ALG was created for.
Moreover, our server requires SIG ALG to call the SIP client when necessary.
SIP client cannot re-register in the SIP server after switching ISP (different NAT).
Description:
In our setup we have two ISP providers, a SIP client with a private IP, and we're using NATs (a different NAT for each ISP provider) with SIG ALG translation, aka SIP nat helper.
When changing the default route from one ISP provider to the another one (manually, or because the ISP link goes down), the Mikrotik applies the wrong NAT rule. Because of this, the SIP register messages cannot reach the SIP server and the SIP connection drops.
If we clean the NAT table or even reboot the router, everything is gonna be ok again.
Versions affected:
6.38(mibspe),6.38.5(chr),6.39.3(mibspe),6.41(chr)
Note: We have tested some real Mikrotiks (mibspe) and run some simulations in GNS3 with routerosx86 Mikrotik virtual machines (chr).
How to reproduce:
- Plug a router to two different ISPs (each one giving you a different real IP) and to an internal network;
- Create proper NAT rules for each ISP;
- Create proper default routes (static routes) for each ISP (the first ISP with the smaller distance);
- Set up a SIP client (in the internal network) to register in an external SIP server and do the register;
- Change the distance of the default routes so the second ISP will be the active route (smaller distance);
- Try to re-register the SIP client in the SIP server and you will see that no SIP message returns and the re-register fails;
- Check the NAT table and run a sniffer in the router and you will see that the router is routing the package via the second ISP but it's still applying the old NAT rule (for the first ISP) instead of the correct NAT rule.
Network setup and detailed how to reproduce:
I have a production setup somebit complicated. However, I run a much more simple setup using GNS3. So, I'm showing this simplified setup here.
The screenshot of my GNS3 setup is above:
The isp1 and isp2 nodes simulate the two different ISPs. They connect to the Internet via GNS3 NAT nodes (if you doesn't know how GNS3 works, just consider that the isp1 and isp2 nodes just behave as real ISPs routers).
Our router (router) is connected to both ISPs and also to the sip-client node (an Ubuntu 14.04 docker node that simulates a SIP client).
A more detailed diagram is showed above:
The implementation of the router node is:
Code: Select all
/ip address
add address=10.10.1.2/24 interface=ether1 network=10.10.1.0
add address=10.10.2.2/24 interface=ether2 network=10.10.2.0
add address=192.168.0.1/24 interface=ether3 network=192.168.0.0
/ip firewall nat
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=10.10.1.2
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=10.10.2.2
/ip route
add distance=1 gateway=10.10.1.1
add distance=2 gateway=10.10.2.1
/system identity
set name=router
Code: Select all
/ip firewall connection tracking
set icmp-timeout=1h
Code: Select all
auto eth0
iface eth0 inet static
address 192.168.0.100
netmask 255.255.255.0
gateway 192.168.0.1
up echo nameserver 192.168.0.1 > /etc/resolv.conf
Code: Select all
[admin@router] > /ip firewall service-port print where name=sip
Flags: X - disabled, I - invalid
# NAME PORTS
0 sip 5060
5061
Code: Select all
[admin@router] > /ip firewall connection print detail where protocol=icmp
Flags: E - expected, S - seen-reply, A - assured, C - confirmed, D - dying,
F - fasttrack, s - srcnat, d - dstnat
0 S C s protocol=icmp src-address=192.168.0.100 dst-address=199.87.121.233
reply-src-address=199.87.121.233 reply-dst-address=10.10.1.2
icmp-type=8 icmp-code=0 icmp-id=521 timeout=58m16s orig-packets=4
orig-bytes=336 orig-fasttrack-packets=0 orig-fasttrack-bytes=0
repl-packets=3 repl-bytes=252 repl-fasttrack-packets=0
repl-fasttrack-bytes=0 orig-rate=0bps repl-rate=0bps
[admin@router] > /ip firewall connection print detail where connection-type=sip
Flags: E - expected, S - seen-reply, A - assured, C - confirmed, D - dying,
F - fasttrack, s - srcnat, d - dstnat
0 SAC s protocol=udp src-address=192.168.0.100:5060
dst-address=199.87.121.233:5060
reply-src-address=199.87.121.233:5060
reply-dst-address=10.10.1.2:5060 connection-type="sip"
timeout=58m21s orig-packets=3 orig-bytes=1 347
orig-fasttrack-packets=0 orig-fasttrack-bytes=0 repl-packets=3
repl-bytes=927 repl-fasttrack-packets=0 repl-fasttrack-bytes=0
orig-rate=0bps repl-rate=0bps
Code: Select all
[admin@router] > /ip route print where dst-address=0.0.0.0/0
Flags: X - disabled, A - active, D - dynamic,
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
# DST-ADDRESS PREF-SRC GATEWAY DISTANCE
0 A S 0.0.0.0/0 10.10.1.1 1
1 S 0.0.0.0/0 10.10.2.1 2
[admin@router] > /ip route set [find static gateway=10.10.1.1] distance=100
[admin@router] > /ip route print where dst-address=0.0.0.0/0
Flags: X - disabled, A - active, D - dynamic,
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
# DST-ADDRESS PREF-SRC GATEWAY DISTANCE
0 A S 0.0.0.0/0 10.10.2.1 2
1 S 0.0.0.0/0 10.10.1.1 100
Now, in the client, we will ping (ICMP) the SIP server and send a SIP message to the SIP server again. We will notice that ICMP ping works, but the SIP message doesn't returns.
The reason is because the new ICMP packets adds a new NAT entry (via the second ISP) but the SIP NAT still uses the NAT via the first ISP (the router NATs the package using the IP of the ether1 although it sends the package via ether2).
Code: Select all
[admin@router] > /ip firewall connection print detail where protocol=icmp
Flags: E - expected, S - seen-reply, A - assured, C - confirmed, D - dying,
F - fasttrack, s - srcnat, d - dstnat
0 S C s protocol=icmp src-address=192.168.0.100 dst-address=199.87.121.233
reply-src-address=199.87.121.233 reply-dst-address=10.10.1.2
icmp-type=8 icmp-code=0 icmp-id=521 timeout=49m38s orig-packets=4
orig-bytes=336 orig-fasttrack-packets=0 orig-fasttrack-bytes=0
repl-packets=3 repl-bytes=252 repl-fasttrack-packets=0
repl-fasttrack-bytes=0 orig-rate=0bps repl-rate=0bps
1 S C s protocol=icmp src-address=192.168.0.100 dst-address=199.87.121.233
reply-src-address=199.87.121.233 reply-dst-address=10.10.2.2
icmp-type=8 icmp-code=0 icmp-id=522 timeout=59m36s orig-packets=2
orig-bytes=168 orig-fasttrack-packets=0 orig-fasttrack-bytes=0
repl-packets=2 repl-bytes=168 repl-fasttrack-packets=0
repl-fasttrack-bytes=0 orig-rate=0bps repl-rate=0bps
[admin@router] > /ip firewall connection print detail where connection-type=sip
Flags: E - expected, S - seen-reply, A - assured, C - confirmed, D - dying,
F - fasttrack, s - srcnat, d - dstnat
0 SAC s protocol=udp src-address=192.168.0.100:5060
dst-address=199.87.121.233:5060
reply-src-address=199.87.121.233:5060
reply-dst-address=10.10.1.2:5060 connection-type="sip"
timeout=59m38s orig-packets=5 orig-bytes=2 245
orig-fasttrack-packets=0 orig-fasttrack-bytes=0 repl-packets=3
repl-bytes=927 repl-fasttrack-packets=0 repl-fasttrack-bytes=0
orig-rate=0bps repl-rate=0bps
[admin@router] > /tool sniffer quick port=5060
INTERFACE TIME NUM DI SRC-MAC DST-MAC VLAN
ether3 12.427 1 <- B2:C2:B4:18:98:07 00:6B:0E:68:7F:02
ether2 12.427 2 -> 00:6B:0E:68:7F:01 00:6B:0E:6B:12:01
-- [Q quit|D dump|C-z pause]
Some questions:
- It's a bug?
- Someone already saw this problem in another setup - like with another SIP helper or with normal UDP NATs?
- What is the expected behavior?
- If it's a bug, how can I inform the Mikrotik suport team about it?
Known workaround:
We're using now the following workaround:
- The router checks from time to time (via a script that runs in /system scheduler) if the default gateway have changed;
- If it discovers a change, then it runs the following command:
Code: Select all
/ip firewall connection remove [find where connection-type=sip]
Disable SIP ALG is not an option:
Please, there's nothing wrong with using SIP ALG (as long as it is implemented without bugs). Actually, our case is exactly the case SIP ALG was created for.
Moreover, our server requires SIG ALG to call the SIP client when necessary.