Experimenting further with this... Legend:
site_A - main site (acts as a server, accepting connections from 0.0.0.0/0)
site_B - the remote site I have a problem with
site_C - the test router (I configured it in the same way as site_B, but for some reason this site connects to site_A without any issues)
I'm trying to analyze and compare IPsec logs from site_B (not working) and site_C (working fine).
Both logs start with the same messages regarding certificates, computing DH, CERT payload, etc. They go about the same up until a point when "working router" (site_C) shows this:
May 13 18:16:31 10.10.3.1 ipsec,debug encryption(aes)
May 13 18:16:31 10.10.3.1 ipsec,debug with key:
May 13 18:16:31 10.10.3.1 ipsec,debug 39ac2d6f 81c86cfe 61e9f4f6 0594d280
May 13 18:16:31 10.10.3.1 ipsec,debug encrypted payload by IV:
May 13 18:16:31 10.10.3.1 ipsec,debug a7bc82de be675f36 5c4e1f7d 032083ff
May 13 18:16:31 10.10.3.1 ipsec,debug save IV for next:
May 13 18:16:31 10.10.3.1 ipsec,debug 438abaa7 ebcebe2d 6254f02c c41838c6
May 13 18:16:31 10.10.3.1 ipsec,debug encrypted.
May 13 18:16:31 10.10.3.1 ipsec,debug 1660 bytes from <site_C_IP>[500] to <site_A_IP>[500]
May 13 18:16:31 10.10.3.1 ipsec,debug 1 times of 1660 bytes message will be sent to <site_A_IP>[500]
and a "problem router" (site_B) shows this:
May 13 18:04:51 10.10.2.1 ipsec,debug encryption(aes)
May 13 18:04:51 10.10.2.1 ipsec,debug with key:
May 13 18:04:51 10.10.2.1 ipsec,debug f8d5b610 eb55b7d5 0a586406 1f43c305
May 13 18:04:51 10.10.2.1 ipsec,debug encrypted payload by IV:
May 13 18:04:51 10.10.2.1 ipsec,debug a8d1eb68 c34e9d7a 70ff007e ad8b23df
May 13 18:04:51 10.10.2.1 ipsec,debug save IV for next:
May 13 18:04:51 10.10.2.1 ipsec,debug 18cbd0db 7f56f37b 2578e900 74aceac8
May 13 18:04:51 10.10.2.1 ipsec,debug encrypted.
May 13 18:04:51 10.10.2.1 ipsec,debug Adding NON-ESP marker
May 13 18:04:51 10.10.2.1 ipsec,debug 1664 bytes from <site_B_IP>[4500] to <site_A_IP>[4500]
May 13 18:04:51 10.10.2.1 ipsec,debug 1 times of 1664 bytes message will be sent to <site_A_IP>[4500]
Mind the "Adding NON-ESP marker" message and the port suddenly changed to 4500 (instead of 500).
Then site_B router also says it gets a retransmitted packet from site_A:
May 13 18:05:00 10.10.2.1 ipsec,info the packet is retransmitted by <site_A_IP>[500].
May 13 18:05:00 10.10.2.1 ipsec,debug KA: <site_B_IP>[4500]-><site_A_IP>[4500]
That's where it basically times out, as it sends the same packet again and again on port 4500, and site_A router keeps retransmitting its request.
As far as I can tell, configuration of site_B and site_C is absolutely the same.
What am I missing? Why does site_B shows this "Adding NON-ESP marker" message and then sends something on port 4500, when site_C router continues to use port 500 (and then succeeds establishing the connection)?