I just replaced an existing v 2.8.26 router with a faster system running 2.9.8 w/4port NIC (both PC based). It seemed to work ok but I started running into very strange intermittent problem with being able to reach various IP addresses that were local and had entries in the ARP table. The weird part is that the Mikrotik was the only unit that couldn't talk to them when it happened. It could however receive pings back from IP addresses just above and below the one in question.
The units are configured the same and are doing some basic routing (static) with some firewall and have NAT rules for several private networks. They also have P2P mangle and queue rules to limit them. I was going to implement a web-proxy on the new unit which was the reason for the upgrade.
I have temporarily wrote a simple script which seems to make the problem not manifest itself. The script just removes an ARP entry from the table every 2 seconds. This entry is immediately repopulated since it happens to be a high traffic server.
I can think of a reason why this would be happening unless the Mikrotik or the Switch are doing something strange.
(Fictitious IP addresses were used to protect the innocent)
Internet:(12.12.12.192/29 subnet)
|12.12.12.193 (Gateway)
|
| 12.12.12.194/29
Mikrotik Router
| 12.12.12.129/27
|
|
Ethernet Switch
|
|
rest of network (servers routers etc).
All of these tests were run from within the Winbox on the router (12.12.12.129) and a PC that is connected through the switch on the other side (12.12.12.146). The unit performs normally for a while and then starts having problems and then will start working again after a few seconds or minutes.
On the MIkrotik I can start a ping to 12.12.12.131 (server, reliable). It will ping for while and then stop getting a response, the ARP entry is still in the table and the server is still available to everyone else on the network. If I wait then it will usually start working again after a few seconds or possibly 2-5 minutes.
While it is failing to get a response from 12.12.12.131 I am still able to ping 12.12.12.130 and 12.12.12.131. When checking from the network internally (from a PC at 12.12.12.146) at the same time the Mikrotik router is having the problem everything is working well except traffic to the two Mikrotik router interfaces. Sometimes only the external interface on the Mikrotik router will fail to respond to pings while the Internal one keeps working.
This is already a strange problem but it doesn't stop here. I noticed that if the problem started and I remove an entry from the ARP table on the Mikrotik then it will start working normally (passing traffic between interfaces for all devices on the network and not dropping pings).
I wrote a script that removes the entry for the server at .131 every two seconds and it seems happy with it. It hasn't dropped any pings since (not even to the server @ 131).
Does anyone have any experience with anything like this? I think it might be either the Mikrotik router or possibly the switch is doing something very strange. Any takers?
- Dan