Traffic issues

mrciarano · Tue May 08, 2018 8:00 pm

Hi all, just wondered if someone can get their head around what could be the issue on a Mikrotik CHR VM I have setup in Proxmox for testing.
(following addresses are example addresses)
I have the following VLANs:
vl2 - 10.2.0.0/16
vl3 - connected to ISP 1.1.1.5/29
vl4 - routed subnet (static on 1.1.1.5/29 from ISP) 2.2.2.2/27

All MTUs are 1500.

Default route is installed statically as 0.0.0.0/0 on the ISP gw on vlan3.

So I have created a masquerade rule that allows devices on 10.2.0.0/16 to NAT as the WAN address 1.1.1.5, which works fine, and throughput is as expected.

On Vl4, clients can ping and resolve fine, however if I try and wget a web page hosted either on vl2 or on the WAN I get this issue where it cannot connect (just hangs and eventually retries):
root@test:~# wget mikrotik.com
--2018-05-08 16:55:11-- http://mikrotik.com/
Resolving mikrotik.com (mikrotik.com)... 159.148.147.196, 2a02:610:7501:1000::2
Connecting to mikrotik.com (mikrotik.com)|159.148.147.196|:80... ^C

I know this is not a hypervisor or switch issue as VyOS works fine.
Running a DNS query against openDNS does reveal the correct IP of the test VM, which is in this example 2.2.2.5/27.

The MTU of the test VM is also 1500.
root@test:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
54: eth0@if55: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether c0:ea:69:74:0c:aa brd ff:ff:ff:ff:ff:ff link-netnsid 0

Any ideas where I'm going wrong or what I may need to test?
Also forgot to mention I have the 1Gb/s trial licence.

MangleRule · Tue May 08, 2018 8:56 pm

Can you run the following command and post it to the thread? It will make it easier for someone to get an idea of what you have setup.

/export hide-sensitive

mrciarano · Tue May 08, 2018 9:09 pm

# may/08/2018 18:05:08 by RouterOS 6.42.1
# software id =
#
#
#
/interface bridge
add name=lo0
/interface ethernet
set [ find default-name=ether1 ] arp=disabled loop-protect=off
/interface vlan
add interface=ether1 name=vl2 vlan-id=2
add interface=ether1 name=vl3 vlan-id=3
add interface=ether1 name=vl4 vlan-id=4
/interface list
add name=WAN
/ip settings
set rp-filter=loose tcp-syncookies=yes
/interface list member
add interface=vl3 list=WAN
/ip address
add address=10.2.0.8/16 interface=vl2 network=10.2.0.0
add address=1.1.1.5/29 interface=vl3 network=1.1.1.0
add address=2.2.2.2/27 interface=vl4 network=2.2.2.0

/ip firewall nat
add action=masquerade chain=srcnat out-interface-list=WAN src-address=\
10.0.0.0/8
/ip route
add distance=1 gateway=1.1.1.1
/ip traffic-flow
set enabled=yes

mrciarano · Wed May 09, 2018 12:22 pm

So an update with a little more explanation - kind of should've started it like this sorry.

I'm at the moment moving from one routing system to another and need to do some asymmetric routing to support the transition.

An overview of how this works at present:

NSP (static) -----> HSRP address (vyOS router1 and vyOS router2) --------> VLAN interfaces with the public subnets assigned.

The end goal is in a few months I want to migrate to MikroTik CHR to replace both vyOS routers, and I will do this by migrating to BGP handoffs, discontinuing the HSRP/VRRP handoff. This is kind of the pre-testing of the solution, hence I need some asymmetric connectivity to work. (I want VRF support, hence the move, policy based routing and separate handoff routers gets messy to handle).

The interim goal is to dual gateway each VLAN to not only test client facing VLAN rules, but to then make the transition fairly seamless - i.e. just updating gateway addresses on the addresses we assign to customers.

So as to sum up how the current configuration stands - see the attached screenshot. Logistically, this has worked in the past for other architectures.

planned chr migration.png

The public subnets from the NSP are routed on the VRRP address. I can statically assign small routes to point to the CHRs (dual route with different metrics for *basic* availability, full subnet and all).
I also tried adding a masquerade rule for the 2.2.2.0/27 on the CHRs, and the test VMs were able to gain connectivity, albeit NAT'd to the /29 of the CHRs. This makes me wonder if there's some filtering I'm not aware of going on somewhere? Though still unsure as I can do an OpenDNS resolver via dig and get the correct public IP of the VM on the test network. But still seemingly drop TCP traffic. To me this kind of suggests I've done something to not observe connection tracking rules in RouterOS - and that would explain why NAT to the edge of the network works?

To mitigate that I've added a static route of the test VM's IP on the Vy routers - 2.2.2.24/32 to 1.1.1.4. I can ping and access this from the outside world, SSH in and all that wonderful stuff. The connection seems to drop every so often and wonder if this could be a sign of expiring ARP entry? Just trying to get a sanity check here!

And just to clarify - I have verified that the MTUs are correct via documentation, tracepaths et al. I'm running at MTU 1500 for layer 3 internet IP traffic - does the L2 MTU keep up with this setting automatically in CHR with VLANs enabled? Tracepath still suggests the max usable MTU is 1500 though, so assume it's mostly correct.

Example configs as they stand now - I'm not integrating VRRP yet, just have the VMs on CHR1's gateway address.

# may/08/2018 18:05:08 by RouterOS 6.42.1
# software id =
#
#
#
/interface bridge
add name=lo0
/interface ethernet
set [ find default-name=ether1 ] arp=disabled loop-protect=off
/interface vlan
add interface=ether1 name=vl2 vlan-id=2
add interface=ether1 name=vl3 vlan-id=3
add interface=ether1 name=vl4 vlan-id=4
/interface list
add name=WAN
/ip settings
set rp-filter=loose tcp-syncookies=yes
/interface list member
add interface=vl3 list=WAN
/ip address
add address=10.2.0.8/16 interface=vl2 network=10.2.0.0
add address=1.1.1.4/29 interface=vl3 network=1.1.1.0
add address=2.2.2.28/27 interface=vl4 network=2.2.2.0

/ip firewall nat
add action=masquerade chain=srcnat out-interface-list=WAN src-address=\
10.0.0.0/8
/ip route
add distance=1 gateway=1.1.1.1
/ip traffic-flow
set enabled=yes

By rights - as IP forwarding is enabled, traffic *should* just flow without firewall rules in any case for anything on a public IP address VLAN?

mrciarano · Wed May 16, 2018 12:20 pm

Just thought I'd bump to make an observation and that it was in fact textbook MTU error. This is something CHR users should investigate if they use Proxmox.

Turns out with OpenvSwitch, it seems Proxmox doesn't respect MTU values, and will start sending 2100 byte frames. This will explain why masquerade was working, as it was solving inbound packet fragmentation issues. This can be solved by turning a couple of options off, though can't remember off the top of my head what they are.

I span up the instance with the same settings in our vCenter cluster and it worked straight away, flawlessly.

I read this post which suggested I reinvestigate MTU issues - and as it turns out, it was textbook.
viewtopic.php?t=122446

At least I wasn't entirely mad!

Traffic issues

Traffic issues

Re: Traffic issues

Re: Traffic issues

Re: Traffic issues

Re: Traffic issues