Proxy-ARP replies to all ARP broadcasts for any IP address

flameproof · Tue May 03, 2022 2:38 pm

Hi all, I'm scratching my head around an issue with proxy-arp. See the below diagram [EDIT: I want to move all CPEs to use the CGNAT /10, assigning IPs from that block without sub-dividing it into subnets, that is where proxy-arp would be useful compared to static routing]:

CCR Proxy ARP.png

The Linux server has a tagged VLAN interface towards the CCR that acts as the bridge between itself an network-edge CCRs serving as PPPoE concentrators. The server has static routes for each of the IP ranges used by the client CPEs, 10.70.0.0/16, 10.71.0.0/16, and 10.72.0.0/16, in each of the three networks, with gateway the VLAN interface (e.g. em2.99 for VLAN 99).

The CCR's bridge has horizon set to 1 on each of the downstream CCR interfaces, to prevent the CCRs from talking to each other. The upstream interface from each network CCR (ETH2) has proxy-arp enabled.

When the server wants to ping e.g. 10.70.0.25, the IP of a CPE under CCR A's control, it broadcasts an ARP request, which, according to my understanding on how proxy-arp works, should only be responded to by CCR A, as it "knows" it has that IP under its control. However, from pcap captures, it seems that all three CCRs (A, B and C) respond to the ARP request from the server, and the server gets whichever arrives first, instead of the correct one.

Is this normal behavior? The second observation I have is that if I add an IP address to ETH2 on the network edge CCRs, e.g. 192.168.0.10/20, the CCR starts flooding the upstream link with ARP requests from all IPs in the /20 range, seemingly at random. It creates incomplete entries in its ARP table for each request, until they expire.

Help!

flameproof · Tue May 03, 2022 3:40 pm

As a follow-up, I have tried solutions posted in other threads, such as adding the IP of the downstream network to the upstream interface (e.g. 10.70.0.1/16 to ETH2 of CCR A), to no avail.

I have fixed the problem by moving to static routes on the Linux server for each IP range, with gateway a static IP on each network CCR's upstream interface. E.g. 192.168.0.10 is the Linux server, and has a route 10.70.0.0/16 gateway 192.168.0.11. Then, CCR A has 192.168.0.11 setup on ETH2, and it takes care of routing.

It would still be nice to have proxy-ARP working, as it eliminates having to keep updating the Linux server's routing tables.

pe1chl · Tue May 03, 2022 5:31 pm

I don't understand why you would want to use proxy arp for this situation.
I could understand it when the 3 PPPoE servers all serve the same subnet and you tried to use this trick to send the traffic to the router which happened to have the client at that time.
But this does not seem to be the case as you mention 3 networks 10.70.0.0/16 etc.
So please do the routing "correctly" and use an autorouting protocol like BGP or OSPF when you do not want to populate routing tables manually.

flameproof · Tue May 03, 2022 5:48 pm

I should have clarified that the idea is to move to CGNAT and give CPEs an IP from the same /10 block, to make things simpler and not have to handle every single network individually. How would I get proxy-arp working in that scenario?

pe1chl · Tue May 03, 2022 5:59 pm

You would use OSPF of iBGP in that scenario, and route the /32 networks individually.

flameproof · Tue May 03, 2022 6:43 pm

OK thanks for the suggestion. OSPF or iBGP complicate our network topology, we try to keep things as dumbed down and simple as possible so that our operation costs are kept in check. Operating a complex nationwide network requires different skills (and thereby costs) than a simple one. When you offer really low cost broadband in emerging markets, the phrase "adding OSPF increases our break even per customer by 6 months" is something that can actually be true.

pe1chl · Tue May 03, 2022 6:47 pm

Sorry, but with the MikroTik equipment you use, OSPF or BGP is just a simple one-time configuration job and it will work fine for this scenario.
I think it has been discussed in one of the MUM talks.

flameproof · Tue May 03, 2022 7:23 pm

Except it is not (a one-time config). Right now, we'd have to add a routing table entry every time we launch a new network and add a CCR PPPoE concentrator. This would be done on the Linux server. If we setup BGP on the Linux server (Ubuntu) via eg. BIRD, against the central CCR that routes all traffic (for example), we'd still need to add the routes to that so they get propagated. Alternatively, setup an individual BGP session with every CCR (we have 60 and growing to 300 by the end of the year...), doesn't sound practical, we may just as well use static routing then.

The good thing about proxy-arp (if it worked as advertised) is that it's a one-off config on the server, you route the entire 10.0.0.0/8 to the VLAN interface and you're done. I'm still interested in why proxy-arp would reply to IP address it has no knowledge about, and why do I see a flood of ARP requests for all the addresses in the subnet by the CCR.

pe1chl · Tue May 03, 2022 7:35 pm

When you think that using a proxy ARP solution would scale to 300 routers in one big network, may the force be with you!

flameproof · Tue May 03, 2022 9:16 pm

The amount of ARP requests that would flow would still be minimal. The only time when the monitoring server needs to access a device behind PPPoE is for remote troubleshooting over SSH. This is not for passing customer traffic, it's for smol smol amounts of management traffic, compared to our volume. To give you an idea of scale, on 12 Gbps at peak time, our management traffic for this purpose (currently done with PPTP tunnels to each network) is no more than 5-6 Mbps. Why would proxy-arp, again - if working the way it is described - be a problem?

Sob · Tue May 03, 2022 11:44 pm

It really does seem broken. It takes into account even default route, and if there's one, it answers all ARP requests, regardless of address. Tested with 6.48.6, 6.49.6 and 7.2.3. I found ancient 6.21.1 and with that it doesn't happen.

sup5 · Wed May 04, 2022 12:28 am

About ten years ago I converted several proxy-arp based networks to OSPF due to all kinds of weird issues and scaling problems.

Thus I really encourage you to move forward to a dynamically routed infrastructure.

The seemingly simplicity of Layer-2 will quickly become a true nightmare.

Proxy-ARP replies to all ARP broadcasts for any IP address

Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address

Re: Proxy-ARP replies to all ARP broadcasts for any IP address