Page 1 of 1

ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Sun Dec 06, 2020 6:58 pm
by libove
Hi MikroTik users,
This three-year-old thread: https://forum.mikrotik.com/viewtopic.php?t=121618
.. talked of problems where ARP entries would hang around for too long. There was also a lot in there about placeholder 00:00:00:00:00:00 incomplete/ no-answer ARPs, but I'm focused on the person or a few people who reported that a device's MAC address would change and the ARP entry would stick with the previous MAC address for minutes or longer.
I've lately begun having an issue that might be similar, though I'm still very much in the head-scratching phase:

I recently had to replace the fibre ONT-router (home/small office stuff from a major Spanish ISP) as the old device had begun to get flaky.
After replacing it, I re-jiggered the cables a bit for two reasons:
1. the new device had an unchangeable client isolation setting, which caused problems for some devices on the LAN to see other devices on the LAN if the two devices had to go through that ISP's p.o.s. device.
2. any time that device would reboot (which is under the control of the ISP) it would take down the LAN, if its built-in switch was in use.

So, the new layout has the ISP's fibre ONT-router at the edge of the LAN, where the main part of the LAN is a Mikrotik Routerboard 951Ui-2HnD and a dumb switch.
When I first did this, I noticed that I seemed to have partitioned the network, and after a bunch of farting around I realized that not all of the Routerboard's Ethernet ports were set to ARP at all. Now, all Ethernet ports are set to ARP=enabled.
The network now is not permanently partitioned. Heh...
But:

There are two WiFi Access Points on the network: one on the ISP's router, and the other at the far end of a Powerline Ethernet Controller extension. They have the same SSID.
When my mobile phone connects to one of the two WiFi APs and then roams to the other (basically, I walk from one end of the house to the other), the phone will cease to be able to perform DNS lookups (DNS is on the MikroTik) for "some time". (More than the 30 second ARP timeout, possibly minutes).
[i]This only happens in one direction - roaming from the WiFi AP at the far end of the PLC network to the WiFi AP on the ISP router.[/i]
Momentarily shutting off WiFi on the phone and turning it back on, or manually reconnecting to a WiFi AP will resolve the problem.

What I'll see, when the problem is occurring, in a packet capture on the MikroTik, is a pattern like this:
[quote] 0 time=2.623 num=1 direction=rx src-mac=5C:17:CF:79:A8:29 dst-mac=CC:2D:E0:13:1A:CF interface=bridge
src-address=192.168.254.31:31030 dst-address=192.168.254.3:53 (dns) protocol=ip ip-protocol=udp size=80 cpu=0 fp=no
ip-packet-size=66 ip-header-size=20 dscp=0 identification=9434 fragment-offset=0 ttl=64

1 time=2.678 num=2 direction=tx src-mac=CC:2D:E0:13:1A:CF dst-mac=5C:17:CF:79:A8:29 interface=bridge
src-address=192.168.254.3:53 (dns) dst-address=192.168.254.31:31030 protocol=ip ip-protocol=udp size=175 cpu=0 fp=no
ip-packet-size=161 ip-header-size=20 dscp=0 identification=41433 fragment-offset=0 ttl=64

2 time=3.143 num=3 direction=rx src-mac=5C:17:CF:79:A8:29 dst-mac=CC:2D:E0:13:1A:CF interface=bridge
src-address=192.168.254.31:43639 dst-address=192.168.254.3:53 (dns) protocol=ip ip-protocol=udp size=80 cpu=0 fp=no
ip-packet-size=66 ip-header-size=20 dscp=0 identification=9456 fragment-offset=0 ttl=64

3 time=3.145 num=4 direction=tx src-mac=CC:2D:E0:13:1A:CF dst-mac=5C:17:CF:79:A8:29 interface=bridge
src-address=192.168.254.3:53 (dns) dst-address=192.168.254.31:43639 protocol=ip ip-protocol=udp size=175 cpu=0 fp=no
ip-packet-size=161 ip-header-size=20 dscp=0 identification=41434 fragment-offset=0 ttl=64

4 time=3.53 num=5 direction=rx src-mac=5C:17:CF:79:A8:29 dst-mac=CC:2D:E0:13:1A:CF interface=bridge
src-address=192.168.254.31:52825 dst-address=192.168.254.3:53 (dns) protocol=ip ip-protocol=udp size=80 cpu=0 fp=no
ip-packet-size=66 ip-header-size=20 dscp=0 identification=9506 fragment-offset=0 ttl=64

5 time=3.531 num=6 direction=tx src-mac=CC:2D:E0:13:1A:CF dst-mac=5C:17:CF:79:A8:29 interface=bridge
src-address=192.168.254.3:53 (dns) dst-address=192.168.254.31:52825 protocol=ip ip-protocol=udp size=175 cpu=0 fp=no
ip-packet-size=161 ip-header-size=20 dscp=0 identification=41435 fragment-offset=0 ttl=64

6 time=4.297 num=7 direction=rx src-mac=5C:17:CF:79:A8:29 dst-mac=CC:2D:E0:13:1A:CF interface=bridge
src-address=192.168.254.31:4612 dst-address=192.168.254.3:53 (dns) protocol=ip ip-protocol=udp size=80 cpu=0 fp=no
ip-packet-size=66 ip-header-size=20 dscp=0 identification=9590 fragment-offset=0 ttl=64

7 time=4.299 num=8 direction=tx src-mac=CC:2D:E0:13:1A:CF dst-mac=5C:17:CF:79:A8:29 interface=bridge
src-address=192.168.254.3:53 (dns) dst-address=192.168.254.31:4612 protocol=ip ip-protocol=udp size=175 cpu=0 fp=no
ip-packet-size=161 ip-header-size=20 dscp=0 identification=41436 fragment-offset=0 ttl=64

8 time=4.635 num=9 direction=rx src-mac=5C:17:CF:79:A8:29 dst-mac=CC:2D:E0:13:1A:CF interface=bridge
src-address=192.168.254.31:52751 dst-address=192.168.254.3:53 (dns) protocol=ip ip-protocol=udp size=96 cpu=0 fp=no
ip-packet-size=82 ip-header-size=20 dscp=0 identification=9618 fragment-offset=0 ttl=64

9 time=4.635 num=10 direction=rx src-mac=5C:17:CF:79:A8:29 dst-mac=CC:2D:E0:13:1A:CF interface=bridge
src-address=192.168.254.31:42543 dst-address=192.168.254.3:53 (dns) protocol=ip ip-protocol=udp size=81 cpu=0 fp=no
ip-packet-size=67 ip-header-size=20 dscp=0 identification=9619 fragment-offset=0 ttl=64

10 time=4.636 num=11 direction=tx src-mac=CC:2D:E0:13:1A:CF dst-mac=5C:17:CF:79:A8:29 interface=bridge
src-address=192.168.254.3:53 (dns) dst-address=192.168.254.31:52751 protocol=ip ip-protocol=udp size=96 cpu=0 fp=no
ip-packet-size=82 ip-header-size=20 dscp=0 identification=41437 fragment-offset=0 ttl=64

11 time=4.679 num=12 direction=tx src-mac=CC:2D:E0:13:1A:CF dst-mac=5C:17:CF:79:A8:29 interface=bridge
src-address=192.168.254.3:53 (dns) dst-address=192.168.254.31:42543 protocol=ip ip-protocol=udp size=229 cpu=0 fp=no
ip-packet-size=215 ip-header-size=20 dscp=0 identification=41438 fragment-offset=0 ttl=64 [/quote]

.. where the mobile phone does successfully get the DNS query to the MikroTik and the MikroTik immediately answers, but apparently the answer doesn't get to the mobile phone.

I realise, just now, as I wrote this, that it's possible that this still is an ARP table issue on the ISP router, being that one of the two WiFi APs is on that router itself.
That ISP router is an Askey RTF8115VW (hardware version "REV5-Mentech-MB374-45-N4-GK-BW-Y", software version "ES_g11.8_RTF_TEF001_V6.28_V008". There are no ARP settings on that ISP router, only the ability to display the current ARP table contents.

BTW the DHCP server is on the MikroTik, and there is a static assignment for my mobile phone, so it will get the same IP address no matter what AP it joins.

On the ISP router the ARP table shows only a correct association between my mobile phone's MAC address and its IP address, and the associated interface as the only interface that the ISP router has towards the rest of my LAN.
On the MikroTik it's the same (correct MAC <-> IP association), and shows the associated Interface as "bridge".

(Ok, so, doesn't seem like it's an ARP-on-ISP-router problem, or even necessarily an ARP problem?
And, the other bridge, which is cabled directly to an Ethernet port on the MikroTik, is dumb - no routing functionality, and no way to view its MAC-to-port association table).

Apologies if I'm being dense (probably I am)...
What could be causing this inability of my mobile phone to reach the MikroTik's DNS server in some interval after the phone has roamed from the WiFi AP at the far end of the PLC network to the WiFi AP on the ISP router?

many thanks,
-Jay

Re: ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Sun Dec 06, 2020 8:28 pm
by jvanhambelgium
Roaming is a client-side "decision", but there are some settings on the AP that can "assist" in this in asking a connected client to move off the AP
(eg. RSSI-values, certain "Fast Roaming" settings)

I have 2 AP's in the house (Ubiquity Networks) that are connected to a Mikrotik 3011 and I have never seen any issues with this in years.
Also my 3011 is providing DHCP-services to all the clients, DNS is provided by some Pi-hole services on the network.
My roaming is near "instant" , only missing a single "ping". I even make conf-calls and do roaming without even noticing this (both with old Samsung S7 phone and on Teams on Dell laptop)

What settings do you have on the AP's that you can tweak with ? Or are these not under your control ? (perhaps your ISP manages these?)

Re: ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Sun Dec 06, 2020 8:32 pm
by sindy
I'm afraid you may be mixing together two things (at least terminologically):
  • a translation table of IP addresses to MAC addresses is usually called "ARP table" and is populated on demand, using the ARP protocol
  • a table linking MAC addresses to bridge/switch ports, which is called a "MAC address table", or "(L2) forwarding table", or (in Mikrotik) "bridge hosts table", and is populated automatically each time a frame with a given MAC address as source one arrives to a given bridge/switch port.
So the ARP table needs no change when the phone migrates from one AP to another, as the mapping of the IP address to the MAC address remains the same and bridge port is not stored in the ARP table (the bridge as a whole is an L3 interface).
The MAC address table should be updated as soon as the frame carrying the DNS query arrives through the new Ethernet interface. So either it is not due to a bug, or there is some other issue.

From your description it seems that the LAN of the ISP modem is bridged through the Mikrotik and the dumb switch to the powerline Ethernet terminal. Can you exclude the dumb switch from the path between the APs, i.e. connect one Ethernet port of the Mikrotik's bridge directly to the ISP gear, and another one directly to the powerline Ethernet terminal? Next, can you sniff with the IP address of the phone alone, but not specifying any interface? It should show you through which Ethernet port the DNS response is leaving. Also, /interface bridge host print will show you the current association of the phone's MAC address with an Ethernet port.

Re: ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Mon Dec 07, 2020 12:49 pm
by libove
@sindy, You're right, once upon a time I had this stuff clear - "of all the things I've lost, I miss my mind the most"...
@jvanhambelgium, thank you - see below for AP information.

So, I modified the connectivity of the network to:

ISP fibre -> ISP router
ISP router -> MikroTik [Ether3]
MikroTik [Ether4] -> PLC
MikroTik [Ether1] -> dumb switch

(I observe that this cannot be my permanent configuration, as the MikroTik's Ethernet ports are 100Mb/s and my Internet connection is 600Mb, but it's ok to test. In the end, a Gigabit switch has to be the core.).

Interestingly, this did NOT resolve the problem.
Roaming from the PLC WiFi back to the ISP WiFi still provokes the problem of my mobile phone being unable to see the DNS replies from the MikroTik.
The mobile phone remains connected-to-WiFi but unable-to-talk-to-MikroTik for however long it takes until the mobile phone figures that "this WiFi isn't going to give me Internet connectivity" and it switches to a different WiFi network (there is also a 2.4GHz WiFi expressed by the ISP's router WiFi AP). [If I then manually switch back to the ISP's router WiFi 5GHz AP the mobile phone WILL remain able to talk to the MikroTik].

Here's a representative sample /tool sniffer packet capture:
[admin@MikroTik2] /tool sniffer> print
                     only-headers: no
                     memory-limit: 100KiB
                    memory-scroll: yes
                        file-name: 
                       file-limit: 1000KiB
                streaming-enabled: no
                 streaming-server: 0.0.0.0:37008
                    filter-stream: no
                 filter-interface: 
               filter-mac-address: 
              filter-mac-protocol: 
                filter-ip-address: 192.168.254.31/32
              filter-ipv6-address: 
               filter-ip-protocol: 
                      filter-port: 
                       filter-cpu: 
                      filter-size: 
                 filter-direction: any
  filter-operator-between-entries: and
                          running: no
                          
135 time=8.318 num=136 direction=rx src-mac=5C:17:CF:79:A8:29 dst-mac=CC:2D:E0:13:1A:CF interface=ether3 
   src-address=192.168.254.31:41824 dst-address=192.168.254.3:53 (dns) protocol=ip ip-protocol=udp size=99 cpu=0 fp=no 
   ip-packet-size=85 ip-header-size=20 dscp=0 identification=30678 fragment-offset=0 ttl=64 

136 time=8.318 num=137 direction=rx src-mac=5C:17:CF:79:A8:29 dst-mac=CC:2D:E0:13:1A:CF interface=bridge 
   src-address=192.168.254.31:41824 dst-address=192.168.254.3:53 (dns) protocol=ip ip-protocol=udp size=99 cpu=0 fp=no 
   ip-packet-size=85 ip-header-size=20 dscp=0 identification=30678 fragment-offset=0 ttl=64 

137 time=8.319 num=138 direction=tx src-mac=CC:2D:E0:13:1A:CF dst-mac=5C:17:CF:79:A8:29 interface=bridge 
   src-address=192.168.254.3:53 (dns) dst-address=192.168.254.31:41824 protocol=ip ip-protocol=udp size=99 cpu=0 fp=no 
   ip-packet-size=85 ip-header-size=20 dscp=0 identification=60728 fragment-offset=0 ttl=64

Here's the Mikrotik's MAC address table entry for my mobile phone at the moment of the problem:
[admin@MikroTik2] /tool sniffer> /interface bridge host print where mac-address=5C:17:CF:79:A8:29             
Flags: X - disabled, I - invalid, D - dynamic, L - local, E - external 
 #       MAC-ADDRESS        VID ON-INTERFACE                           BRIDGE                           AGE                 
 0   D E 5C:17:CF:79:A8:29      ether3                                 bridge                 
Ether3 is correct - that's the physical port on the Mikrotik to which the ISP router is directly connected.


To @jvanhambelgium's questions about the APs:

The ISP router WiFi AP offers the following configuration settings which may be related to roaming or otherwise to the current issue:
Roaming - Enabled (unchangeable)
Roaming Role - Master (unchangeable)
Client Isolation - Disabled (I had found a place to edit this elsewhere)

The Tenda PLC WiFi AP has no such settings.

I do notice that the Tenda PLC WiFi AP doesn't seem to notice that clients have disconnected/ roamed off to a different WiFi AP.
So, if I walk my mobile phone to the end of the house where the Tenda PLC WiFi AP is by far the stronger signal, and the mobile phone roams onto that WiFi AP, and then I look at the Tenda PLC WiFi AP's list of connected clients, of course I see the mobile phone there. The ISP router local network map correct shows that my mobile phone is now "Ethernet-connected".
Walking my mobile phone back to the end of the house where it roams back onto the ISP router's WiFi AP, the ISP router's local network map now shows my mobile phone as a WiFi client, but the Tenda PLC WiFi AP list of connected clients still shows my mobile phone.

I'm not sure why that would matter, since we know that the mobile phone is seen by the MikroTik on the MikroTik's Ethernet port which is connected to the ISP's router; we see that the mobile phone's DNS client queries are received and answered by the Mikrotik; what's unclear is where the MikroTik's answers (which /tool sniff shows going out on "bridge" but I don't see on which Ethernet port) go since these DNS answers apparently don't get back to my mobile phone.

I did one more test - I sniffed only by individual etherN interfaces instead of on bridge or with filter-interface=""
Despite that we see DNS queries arriving from the mobile phone attached to the ISP WiFi AP on ether3 AND bridge, and we see DNS replies going on only on bridge but not also on ether3, we also do not see any packets at all relating to the mobile phone's IP address on ether1,ether2,ether4,ether5 in a sniff.
Q: Should we see DNS reply packets in sniff output both tx to bridge and tx to an etherN port, like we do see the DNS query packets coming in?
Q: If we should see the DNS reply packets on both bridge and also on an etherN port, then since we're not seeing that, where are the packets actually doing?

many thanks again,
-Jay

Re: ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Mon Dec 07, 2020 1:36 pm
by sindy
Two points:
  • from what you wrote now, I assume that the initial topology (which I could not find clearly in the OP) was that the PLC, the ISP router, and the RB951Ui-2HnD were all directly connected to the dumb gigabit switch. In such a topology, from the perspective of the Mikrotik, the phone remains accessible through the same L3 interface (bridge) and the same L2 interface (the Ethernet one connected to the dumb switch) no matter to which AP it is associated, so the records in the /ip arp table and in the /interface bridge host table of the Mikrotik itself do not need to change.
  • I forgot to tell you to set hw=no on the relevant rows of /interface bridge port before sniffing. The need for this setting does not seem logical to me, but without it, the egress traffic from the bridge (i.e. from the Mikrotik itself) via the Ethernet interface is not sniffed. So change this and try again.
Assuming that after the modification above, you'll see the DNS response to leave through ether3, and given that the gigabit switch is out of the path and nevertheless the DNS response doesn't reach the phone, the issue must be in the ISP router or in the phone itself.

Re: ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Mon Dec 07, 2020 3:09 pm
by libove
Thanks for the pointed to disabling hardware offload for the purpose of sniffing. It actually does make some sense to me, as with hardware offload enabled the sniffer may be functioning at a higher level in the kernel - or, period, may have no choice but to be functioning in the kernel - at a place where hardware-offloaded bridge-to-pot transfers cannot be seen.
So, with hardware offload temporarily disabled, I DO see the packet going out the bridge interface and via the appropriate ether3 port.
So, indeed, it's not the Mikrotik that's at issue.

Realizing that we're now out of the space where the Mikrotik forums have even the slightest obligation to help ... 😅
.. any ideas what the ISP router/WiFi AP, or a OnePlus Nord phone (OnePlus' "OxygenOS", based on AnNdroid 10) might be doing wrong to provoke this frustrating behavior?

thanks!
-Jay

Re: ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Mon Dec 07, 2020 4:15 pm
by sindy
It actually does make some sense to me, as with hardware offload enabled the sniffer may be functioning at a higher level in the kernel - or, period, may have no choice but to be functioning in the kernel - at a place where hardware-offloaded bridge-to-pot transfers cannot be seen.
The point is that the "bridge" interface is a virtual one, and the sniffer tap is really in the kernel, at the place where the packets egress or ingress via the CPU port to which the switch chip is connected. With hardware offload, the frames forwarded between two Ethernet ports bypass the CPU, but frames to/from the CPU don't. So the fact that the CPU->ethernet ones are not sniffed when hw=yes is a bug.

Realizing that we're now out of the space where the Mikrotik forums have even the slightest obligation to help ... 😅
Well, there'd be no such obligation even if it was a Mikrotik issue :)

.. any ideas what the ISP router/WiFi AP, or a OnePlus Nord phone (OnePlus' "OxygenOS", based on AnNdroid 10) might be doing wrong to provoke this frustrating behavior?
The very basic analytic method is to divide the failing path into smaller sections until you identify the shortest possible one. So we've proven that the dumb gigabit switch is not guilty because it wasn't there at all when the issue happened, and that the Mikrotik isn't guilty either as it did send the frame out (actually, this is not 100 % sure because if the switch chip is dropping them, the sniffing doesn't show that, but let's leave this highly unlikely variant aside for now).
So now, the remaining part is to determine whether the problem lays in the ISP router or in the mobile phone. The easier way is to use another client than the phone. A more complicated way is to sniff the traffic in the air, which requires a hardware capable of wireless sniffing in the 5 GHz band, with the modulation schemes used etc., not exactly a beginner's task.

Re: ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Tue Dec 08, 2020 11:55 am
by libove
Thanks again.
So, I booted up and cleaned up and updated an older notebook computer, and tested it doing the same roam-between-WiFi-APs.
It suffers the same symptom.
So it's not the mobile phone, but rather (gee, what a surprise!) the ISP router.
There are quite active forums for this ISP and its (incredibly varied, and almost always problematic in weird ways) customer premises equipment, so I'll post over there and see if someone has something to say. I'll feed back here if I get anything.
Again, thanks to all for the help, and if any other ideas come to mind, they're welcome!
warm regards,
Jay Libove, CISSP, CIPP/US, CIPT, CISM(retired)

Re: ARP for hosts that migrate across (non-MTik) WiFi access points?

Posted: Tue Dec 08, 2020 1:30 pm
by libove
One more bit of information - I did a packet capture on the roaming Windows 10 notebook computer. It confirms the above - the DNS replies (and PING replies, and any other kind) from the Mikrotik, that we see leaving the Mikrotik correctly (right source and dest IP and MAC addresses, right port, etc) do not reach the roamed WiFi client.
(And, in case I didn't mention it above, it's not just DNS and it's not just the Mikrotik - the roamed WiFi client once on the ISP router WiFi, until however long it takes either for something to time out or for the client's WiFi client software to decide that This Just Ain't Working and it re-roams - cannot receive communication back from _any_ other internal LAN device. But it can still keep communicating out to the Internet as long as it already has the destination IP address in its local DNS cache).