Page 1 of 1

Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Tue Aug 21, 2018 6:33 pm
by dorian
Hi,

we have set up a site-to-site VPN connecting our main office (running strongSwan 5.5.1 on Debian Stretch) to a branch office (using a RB2011iL with RouterOS 6.40.8 ). It's a IKEv2 tunnel using a PSK and the default key lifetime of 1 hour.

Everything works great, but every now and then (maybe around once a week), the rekeying of the tunnel goes wrong: after the rekey, the two sides are no longer in agreement about which keys to use.

After such a rekey, the SAs on strongSwan side look like this:
src <STRONGSWAN_IP> dst <MIKROTIK_IP>
  proto esp spi 0x09890c39 reqid 2 mode tunnel
  replay-window 0 flag af-unspec
  auth-trunc hmac(sha256) 0x511b4b8066dd9f91e028411ac41edd0a0c2ca686191d4201d20cf699a88fc4da 128
  enc cbc(aes) 0x74a144af92c8b3be372a04dac3c19c8c
  anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

src <MIKROTIK_IP> dst <STRONGSWAN_IP>
  proto esp spi 0xcff9f231 reqid 2 mode tunnel
  replay-window 32 flag af-unspec
  auth-trunc hmac(sha256) 0x162e38a241b3c7fe6f18a61f9f8a45ffd005b4e4e801e3e502462b9f31e3c5f9 128
  enc cbc(aes) 0x06ef2900e20f793e87b535e56c80a2a5
  anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

While the Mikrotik side uses the following SAs:
1  E spi=0x9890C39 src-address=<STRONGSWAN_IP> dst-address=<MIKROTIK_IP> state=mature 
    auth-algorithm=sha256 enc-algorithm=aes-cbc enc-key-size=128 
    auth-key="3d5138d833aa55ceaee9801707c48a4dfb80e29fdc899cba6e39b188a03bc83b" 
    enc-key="c30de4b999ba413af5da8d2d8e024001" add-lifetime=48m2s/1h3s replay=128 

2  E spi=0xCFF9F231 src-address=<MIKROTIK_IP> dst-address=<STRONGSWAN_IP> state=mature 
    auth-algorithm=sha256 enc-algorithm=aes-cbc enc-key-size=128 
    auth-key="e9d1eefcb8250c7b5795c381e7c9eb7b44e67064fb1a90334335d372f5efadb3" 
    enc-key="a856120a0824bb2fc60aa565482d303e" addtime=aug/21/2018 10:18:10 expires-in=51m18s 
    add-lifetime=48m2s/1h3s current-bytes=35703 current-packets=144 replay=128 
From that output it appears that both sides use different auth and encryption keys for the same SPI. As a consequence, any ESP packets belonging to the connection are just being dropped on both ends.
Restarting the connection immediately fixes this until the next time that a key mismatch happens after a rekey.

The log entries of the rekeying on the strongSwan side don't look very suspicious to me:
2018-08-21T10:18:09.616057+02:00  09[IKE] <conn_name|1> queueing CHILD_REKEY task
2018-08-21T10:18:09.616674+02:00  09[IKE] <conn_name|1> activating new tasks
2018-08-21T10:18:09.617223+02:00  09[IKE] <conn_name|1>   activating CHILD_REKEY task
2018-08-21T10:18:09.617782+02:00  09[IKE] <conn_name|1> establishing CHILD_SA conn_name{2}
2018-08-21T10:18:09.618394+02:00  09[CFG] <conn_name|1> proposing traffic selectors for us:
2018-08-21T10:18:09.618955+02:00  09[CFG] <conn_name|1>  10.11.0.0/16
2018-08-21T10:18:09.619505+02:00  09[CFG] <conn_name|1>  10.10.0.0/16
2018-08-21T10:18:09.620064+02:00  09[CFG] <conn_name|1> proposing traffic selectors for other:
2018-08-21T10:18:09.620697+02:00  09[CFG] <conn_name|1>  10.50.0.0/16
2018-08-21T10:18:09.621381+02:00  09[CFG] <conn_name|1> configured proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_2048/NO_EXT_SEQ
2018-08-21T10:18:09.636466+02:00  09[ENC] <conn_name|1> generating CREATE_CHILD_SA request 30 [ N(REKEY_SA) SA No KE TSi TSr ]
2018-08-21T10:18:09.637105+02:00  09[NET] <conn_name|1> sending packet: from <STRONGSWAN_IP>[4500] to <MIKROTIK_IP>[4500] (496 bytes)
2018-08-21T10:18:10.571371+02:00  12[NET] <conn_name|1> received packet: from <MIKROTIK_IP>[4500] to <STRONGSWAN_IP>[4500] (480 bytes)
2018-08-21T10:18:10.572847+02:00  12[ENC] <conn_name|1> parsed CREATE_CHILD_SA response 30 [ No KE TSi TSr SA ]
2018-08-21T10:18:10.590638+02:00  12[CFG] <conn_name|1> selecting proposal:
2018-08-21T10:18:10.591675+02:00  12[CFG] <conn_name|1>   proposal matches
2018-08-21T10:18:10.592406+02:00  12[CFG] <conn_name|1> received proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_2048/NO_EXT_SEQ
2018-08-21T10:18:10.593170+02:00  12[CFG] <conn_name|1> configured proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_2048/NO_EXT_SEQ
2018-08-21T10:18:10.593884+02:00  12[CFG] <conn_name|1> selected proposal: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_2048/NO_EXT_SEQ
2018-08-21T10:18:10.594612+02:00  12[CFG] <conn_name|1> selecting traffic selectors for us:
2018-08-21T10:18:10.595126+02:00  12[CFG] <conn_name|1>  config: 10.11.0.0/16, received: 10.10.0.0/16 => no match
2018-08-21T10:18:10.595605+02:00  12[CFG] <conn_name|1>  config: 10.10.0.0/16, received: 10.10.0.0/16 => match: 10.10.0.0/16
2018-08-21T10:18:10.596273+02:00  12[CFG] <conn_name|1> selecting traffic selectors for other:
2018-08-21T10:18:10.596780+02:00  12[CFG] <conn_name|1>  config: 10.50.0.0/16, received: 10.50.0.0/16 => match: 10.50.0.0/16
2018-08-21T10:18:10.597258+02:00  12[CHD] <conn_name|1>   using AES_CBC for encryption
2018-08-21T10:18:10.597731+02:00  12[CHD] <conn_name|1>   using HMAC_SHA2_256_128 for integrity
2018-08-21T10:18:10.598220+02:00  12[CHD] <conn_name|1> adding inbound ESP SA
2018-08-21T10:18:10.598695+02:00  12[CHD] <conn_name|1>   SPI 0xcff9f231, src <MIKROTIK_IP> dst <STRONGSWAN_IP>
2018-08-21T10:18:10.599168+02:00  12[CHD] <conn_name|1> adding outbound ESP SA
2018-08-21T10:18:10.599642+02:00  12[CHD] <conn_name|1>   SPI 0x09890c39, src <STRONGSWAN_IP> dst <MIKROTIK_IP>
2018-08-21T10:18:10.600129+02:00  12[IKE] <conn_name|1> CHILD_SA conn_name{26} established with SPIs cff9f231_i 09890c39_o and TS 10.10.0.0/16 === 10.50.0.0/16
2018-08-21T10:18:10.600670+02:00  12[IKE] <conn_name|1> reinitiating already active tasks
2018-08-21T10:18:10.601294+02:00  12[IKE] <conn_name|1>   CHILD_REKEY task
2018-08-21T10:18:10.601800+02:00  12[IKE] <conn_name|1> closing CHILD_SA conn_name{21} with SPIs cca08482_i (29988 bytes) 0f456b61_o (47112 bytes) and TS 10.10.0.0/16 === 10.50.0.0/16
2018-08-21T10:18:10.602274+02:00  12[IKE] <conn_name|1> sending DELETE for ESP CHILD_SA with SPI cca08482
2018-08-21T10:18:10.602744+02:00  12[ENC] <conn_name|1> generating INFORMATIONAL request 31 [ D ]
2018-08-21T10:18:10.603211+02:00  12[NET] <conn_name|1> sending packet: from <STRONGSWAN_IP>[4500] to <MIKROTIK_IP>[4500] (80 bytes)
2018-08-21T10:18:10.605572+02:00  08[NET] <conn_name|1> received packet: from <MIKROTIK_IP>[4500] to <STRONGSWAN_IP>[4500] (96 bytes)
2018-08-21T10:18:10.606048+02:00  08[ENC] <conn_name|1> parsed INFORMATIONAL response 31 [ ]
2018-08-21T10:18:10.606520+02:00  08[IKE] <conn_name|1> CHILD_SA closed
2018-08-21T10:18:10.607076+02:00  08[IKE] <conn_name|1> activating new tasks
2018-08-21T10:18:10.607624+02:00  08[IKE] <conn_name|1> nothing to initiate
Unfortunately I don't have any logs from RouterOS at the moment.

Has anyone experienced something similar to this before? As mentioned, the issue is somewhat hard to debug as it only occurs rarely and I haven't found a way to reproduce it. Any pointers are greatly appreciated.

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Wed Aug 29, 2018 5:51 pm
by dorian
Bumping this as it actually seems to occur more frequently now that we've updated to v6.40.9 :(

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS  [SOLVED]

Posted: Wed Aug 29, 2018 6:22 pm
by sindy
There is a support ticket open on it and Mikrotik is actively working on the issue. Complaints on forum do not help localize and identify any issue, Mikrotik needs detailed information regarding the circumstances to be able to analyze and resolve it.

A particular release is unlikely to affect the frequency, I am continuously monitoring occurences of this issue on a link between two lab devices and the interval between occurrences is random, sometimes I can see an occurrence three hours after the previous one and sometimes it takes longer than a day (I'm testing 10 policies in parallel to increase the probability).

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Thu Aug 30, 2018 2:51 pm
by dorian
Thanks for your response. I don't really understand how my post could be seen as a complaint, but if you think that way, I'm sorry about it. It's good to hear that this is being actively worked on.

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Thu Aug 30, 2018 3:33 pm
by sindy
First sorry if the word "complaint" sounded offensive. English is not my native language so I may not feel all the subtle shades properly.

What I had in mind was that this is a user forum, so not the proper channel to report issues to Mikrotik. Gents of Mikrotik staff are actively present here but it seems they do not react to every single topic, so if you believe you've found an issue, the right thing to do is to send the information along with the supout.rif file to support@mikrotik.com. Which does not mean that you should not post it here; I just say that posting it here should not be the only thing to do.

The basic purpose of the forum is to get advice from fellow users on configuration etc. I have no idea how many people set up IPsec sessions between Mikrotik and Strongswan (I have a few but haven't noticed the issue on them), but even if you feel that the issue may be on other than Mikrotik side, it is still the right thing to send the information to support, as compatibility is an important feature of the product so even if the issue turned out to be at strongswan side (which is clearly not the case here), it would be important for Mikrotik to be aware of it and e.g. mention it in the documentation.

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Thu Aug 30, 2018 5:32 pm
by dorian
Thank you for your feedback, I appreciate it!

The reason I didn't reach out to Mikrotik support is that their support page mentions that email support is only available for 30 days after the purchase of a RouterOS licence or product. But seeing as this is a verified bug, I'll make sure to contact them directly.

As a side-note, is there any list of known issues such as this one? I couldn't find anything in the release notes.

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Thu Aug 30, 2018 11:45 pm
by sindy
My private understanding is that the purpose of the statement you refer to is to prevent people from sending trivial beginners' questions to support because Mikrotik doesn't have enough staff to handle the resulting volume. Issues like this are a different thing.

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Tue Sep 11, 2018 11:14 pm
by sindy
From the 6.44 beta topic:

What's new in 6.44beta6 (2018-Sep-11 08:52):
...
*) ike2 - fixed rare authentication and encryption key mismatches after rekey with PFS enabled;

So if you can afford to dedicate a device for testing the 6.44beta6 against strongswan, it would be great. I'll do that on a pair of Mikrotiks in a few days, currently they are busy testing something else.

The sooner the fix is proven to be stable, the sooner it can be backported to 6.43.x.

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Wed Nov 28, 2018 1:30 pm
by sergeyk
Looks like this issue isn't really fixed, I've tested 6.43.4 and 6.44beta28
IKEv2 in transport mode
proposals: add auth-algorithms=sha256 enc-algorithms=aes-256-cbc lifetime=1h pfs-group=modp4096
profile: dh-group=modp4096 dpd-interval=10s dpd-maximum-failures=3 enc-algorithm=aes-256 hash-algorithm=sha256 lifetime=2h
connected to strongswan 5.7.1 on centos7

Everything works normal for some time, rekeying every 1h, but after 2-3 days finally situation looks like in first post
both sides have correct spi numbers, but auth and enc keys are completely different, so all ecrypted traffic is dropped by both peers, at this time
both peers are sending DPD to each other, because they can't see any activity on this link, but dpd isn't using esp and this keys,
so link looks like alive to both sides and not re-authenticated.
After one more hour on next rekeying keys are again in sync and traffic starts to flow.
I've installed 6.44beta39 today and I'll look how it goes, but I can't see any changes on rekeying in changelog, and most likely it'll work exactly the same,
this is very annoying.
And of course it starts working if reauth is forced.
Does anyone have any ideas what can cause this?
Or maybe I should submit this bug somewhere else?

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Wed Nov 28, 2018 9:00 pm
by sindy
Or maybe I should submit this bug somewhere else?
Please do submit it as a bug to support@mikrotik.com, this forum is not the right channel. I haven't experienced this bug ever since 6.43.2, but all my IKEv2 sessions run either between two Mikrotiks or between a Mikrotik and a Windows machine where pfs is not supported by the Windows side, so it is theoretically possible that against strongswan it behaves in a different way.

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Wed Apr 08, 2020 11:49 am
by sergeyk
1.5 years passed and this issue still exists)

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Wed Apr 08, 2020 12:15 pm
by sindy
1.5 years passed and this issue still exists)
Have you open a trouble ticket at support@mikrotik.com as I've suggested 1.5 years ago? In your setup, does the issue appear between two Mikrotiks or between a Mikrotik on one end and some other IKEv2 implementation on another? I haven't noticed a single occurrence of that issue during all that time at about 20 IKEv2 links (which doesn't necessarily mean they don't exist but on most of the links an occurrence of a 30-minut outage would raise an alert), many of them using multiple SAs in parallel. So it may be related to your hardware platform, to interworking with the other IKEv2 implementation... Mikrotik developers need all these details to be able to simulate the issue in their lab and fix it.

Re: Infrequent SA Key Mismatches Between strongSwan and RouterOS

Posted: Wed Apr 08, 2020 12:36 pm
by sergeyk
Have you open a trouble ticket at support@mikrotik.com as I've suggested 1.5 years ago? In your setup, does the issue appear between two Mikrotiks or between a Mikrotik on one end and some other IKEv2 implementation on another? I haven't noticed a single occurrence of that issue during all that time at about 20 IKEv2 links (which doesn't necessarily mean they don't exist but on most of the links an occurrence of a 30-minut outage would raise an alert), many of them using multiple SAs in parallel. So it may be related to your hardware platform, to interworking with the other IKEv2 implementation... Mikrotik developers need all these details to be able to simulate the issue in their lab and fix it.
Of course I've opened ticket 1.5 years ago)
It's problem between mikrotik and strongswan, when using PFS.
Disabling PFS is a bad fix for this.
IIRC when i was investigating issue I've also tried another implementation.