Community discussions

MikroTik App
 
Jightning
just joined
Topic Author
Posts: 9
Joined: Sat Jun 25, 2016 3:18 am

Power outage causes specific sites to be blocked

Tue Sep 11, 2018 11:30 pm

I have a strange Issue that we have run into twice now. After losing power sometimes our Cloud Core Routers will not pass traffic to Netflix.com macu.com woot.com and others. But they will pass traffic for websites like Google.com youtube.com hulu.com amazon.com.

The first time it happened a reboot of the CCR fixed the glitch. Recently it happened to another one of our CCR. This time we had to revert to a backup from our FTP system that was made a few days earlier to the glitch. If you don't have this you need it. it is a lifesaver https://wiki.mikrotik.com/wiki/Automated_Backups

Anyway, While troubleshooting we tried
  • disabling everything under IP firewall
  • sending the traffic through a different BGP peer. (we have a single-multihomed set up there)

This looks exactly like some kind of BGP error. We thought that our IP blocks were just not making it to the entire internet. That would explain why we could reach some sites and not others. Talking to our BGP peer we found that the smaller packets needed to open the Netflix.com website where flowing both ways. But the larger packets with the website HTML in them where never being sent to our clients. There was 2-way communication between the user PC and netflix.com but the session would never complete.

We solved the issue by changing the path of the traffic so that it did not flow through the CCR that was blocking Netflix. This is what clued us in that the issue was the CCR its self. If the CCR that had previously lost power routed the traffic we could not reach netflix.com. If we sent the traffic through another CCR and to the internet then we could reach netflix.com.

I know this is vague and does not have enough detail to diagnose anything. What kind of info should I gather to better demonstrate the issue? I expect to encounter the issue again and would like to know what Info I should gather to help pin down what is happening.
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 11122
Joined: Mon Dec 04, 2017 9:19 pm

Re: Power outage causes specific sites to be blocked

Tue Sep 11, 2018 11:49 pm

It sounds like a MTU issue given that small packets do get through and large ones don't.

So the next time it happens, I'd sniff packets to/from the server address simultaneously at the server-facing and client-facing interfaces of the CCR into a file and use Wireshark to see whether the large packets arrive to the CCR and whether the CCR forwards them to the client; if they come and are too large to fit, whether the CCR properly sends an icmp notification about it.

It is possible that one of the possible paths between the server and the CCR has an issue with forwarding icmp so the root cause may not be the CCR itself.
 
Jightning
just joined
Topic Author
Posts: 9
Joined: Sat Jun 25, 2016 3:18 am

Re: Power outage causes specific sites to be blocked

Wed Sep 12, 2018 12:05 am

It sounds like a MTU issue given that small packets do get through and large ones don't.
I'll check the MTU issue next time it happens. I could load up youtube and watch videos. But I could not even get to the login screen for Netflix. Speedtest.net was still working and gave me 100Mbps+.
It is possible that one of the possible paths between the server and the CCR has an issue with forwarding icmp so the root cause may not be the CCR itself.
Right now we are using the CCR that was causing the issue to pass traffic. The only thing that changed to get it to start working was to restore a config file from before the power outage.
 
tippenring
Member
Member
Posts: 304
Joined: Thu Oct 02, 2014 8:54 pm
Location: St Louis MO
Contact:

Re: Power outage causes specific sites to be blocked

Wed Sep 12, 2018 4:25 am

Right now we are using the CCR that was causing the issue to pass traffic. The only thing that changed to get it to start working was to restore a config file from before the power outage.
What was different between the two configs? That's an easy thing to look at.
 
Jightning
just joined
Topic Author
Posts: 9
Joined: Sat Jun 25, 2016 3:18 am

Re: Power outage causes specific sites to be blocked

Wed Sep 12, 2018 4:36 am

I'll take a look and post them without sensitive configs. Too bad I can't use the Mikrotik auto remove sensitive on saved backups.
 
tippenring
Member
Member
Posts: 304
Joined: Thu Oct 02, 2014 8:54 pm
Location: St Louis MO
Contact:

Re: Power outage causes specific sites to be blocked

Wed Sep 12, 2018 4:46 am

I'll take a look and post them without sensitive configs. Too bad I can't use the Mikrotik auto remove sensitive on saved backups.
You don't necessarily need to post them. Just load the before and after in notepad++ and do a compare.
 
Jightning
just joined
Topic Author
Posts: 9
Joined: Sat Jun 25, 2016 3:18 am

Re: Power outage causes specific sites to be blocked

Wed Sep 12, 2018 8:05 pm

So, unfortunately, the files are in raw binary. I did not realize that you can not read .backup files.

Here is a breakdown on what I remember changing on this router. I made some changes to the config on the morning of Friday the 7th. Netflix was working just fine after the changes. People would have called our office if Netflix was down all of Friday and Saturday. But they did not start calling until after the router lost power on Sunday the 9th when a battery backup failed. I had not logged into the router since making the changes Friday.

The changes I made Friday morning two days before the Netflix block were to replace the public IP addresses used for transit with private addresses in all of our network. Then I added the /24 that I had reclaimed to a bridge on the CCR. I set up an IP pool and DHCP server. Added the /24 to our BGP advertisements. I checked the advertisement with a BGP looking glass and could see it advertised through the correct neighbor AS.

I would think that only a firewall config problem could block specifically Netflix and other sites. I tried disabling all the firewall rules to no effect.

I guess if I have too I will revert to the config from Sunday morning that was causing the problem and see if I can reproduce the issue. I am betting that it works just fine.

Are there any know issues with Mikrotik losing power or browning out?
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 11122
Joined: Mon Dec 04, 2017 9:19 pm

Re: Power outage causes specific sites to be blocked

Wed Sep 12, 2018 8:47 pm

So we have to wait when it happens next time.

Now, as it is working, instead (or in addition to) doing a "backup" (which goes to a file automatically), do an "/export hide-sensitive file=some-name". It will save the configuration in a readable form into a file named "some-name.rsc". Download the file somewhere else.

After the power outage (which you can schedule to some maintenance window), check that the issue is present and if yes, export the configuration the same way. Then compare the text files as suggested above.

One issue which is not specific to Mikrotik devices - if you power the device directly from the backup battery, with no "clever" device between the two, and the battery discharges below the acceptable input voltage range of the device, the device may do weird things, including writing where it should not write in its flash. It depends on the internal construction whether the CCR is sensitive about this. Normally the power stations have a battery voltage level at which they cut off the load to protect both the batteries from deep discharge and the equipment from this to happen.
 
Jightning
just joined
Topic Author
Posts: 9
Joined: Sat Jun 25, 2016 3:18 am

Re: Power outage causes specific sites to be blocked

Thu Sep 13, 2018 9:49 am

I just restored the config from a backup made right after the power outage. Netflix.com is reachable.

I also have more info on the power outage. All I know is that I was able to log into our CCR though our fiber internet connection after our primary radio links went down. Our 11Ghz and 60Ghz radios lost power when our $5000 battery backup died >:(.

Possibly the CCR experienced a power fluctuation at that time the battery backup died that caused a glitch that made the router start blocking Netflix.com?

I think reverting to an older config fixed the problem because the router rebooted and cleared the error. Maybe a reboot to restore a backup is a deeper reboot than a normal one.

Next time it happens I'll send someone to pull the plug. After I do a config export ;)

I'll post an update when it happens again.
 
tippenring
Member
Member
Posts: 304
Joined: Thu Oct 02, 2014 8:54 pm
Location: St Louis MO
Contact:

Re: Power outage causes specific sites to be blocked

Thu Sep 13, 2018 6:27 pm

Here's a different possible cause to look at. I believe you've described your network as having two parallel border CCR routers. Is that correct? If so, when the power returns, could one router be the default gateway for your network, but actually be routing the traffic to the other border router (and likely sending ICMP redirects to the originating host) for routing to the internet?

I recently had a problem where, due to a transitional network period, my client had two routers connected to two different ISPs on the same network. The primary circuit went down, so I set the default gateway (Mikrotik connected to the primary circuit) to route traffic to the other router (Adtran). This worked for most hosts. However, a few were unable to open certain websites (msn.com was one), although other sites worked just fine.

Turned out the only hosts having trouble were not accepting the ICMP redirect from the primary router, so the primary router was routing every packet to the Adtran. Once we set the PCs to accept ICMP redirects, the computers had no more problems.

I still have not figured out why this was a problem. If anything, I expected the computers that changed their next hop to a destination (due to the ICMP redirect) in the middle of a TCP session to have trouble with certain sites. That was not the case. I would like to duplicate the setup at some point to figure out the root cause, but that's low on the priority list. I have a sneaky feeling that it is related to load balancers which incorporate TTL as part of the load balancing hash, but that's why I would expect the hosts that accept the ICMP redirect to have problems since the load balancer would see a TTL change once the host began sending packets directly to the secondary router. That is the opposite of what I saw.

In any case, the point is that if you are routing from one of two parallel border routers to the other, perhaps you're seeing similar behavior to what I was.
 
Jightning
just joined
Topic Author
Posts: 9
Joined: Sat Jun 25, 2016 3:18 am

Re: Power outage causes specific sites to be blocked

Fri Sep 14, 2018 1:24 am

I don't think that is the issue. But it is a great Idea. Our 2 CCR in the area are not parallel. They are actually 150 miles apart. We have a layer 3 switch on the mountaintop separating them with OSPF. We don't use ICMP redirects at all.

It looks like this

Frontier fiber-----Blanding CCR ------ Abajo Cisco switch ---------- Pleasant view CCR------ Farmers telco fiber

Blanding is the default gateway for 2 blocks of /24s. Abajo is a default gateway for 2 other /24s. Normally Blanding sends its 2 blocks out frontier and Abajo out Farmers.

Pleasant view OSPF originates a type 1 default route as does Blanding.
The link between the Blanding CCR and Abajo has an inflated cost to make sure the Abajo switch sends its traffic to Pleasant view.

If farmers goes down the Abajo switch will use Frontier as an alternative. The same the other way around.

I don't know how it could be relevant but when the power went out our microwave going from Blanding to Abajo lost power. The microwave is used for management only. Traffic continued to flow out frontier.
 
tippenring
Member
Member
Posts: 304
Joined: Thu Oct 02, 2014 8:54 pm
Location: St Louis MO
Contact:

Re: Power outage causes specific sites to be blocked

Fri Sep 14, 2018 9:35 pm

I don't think that is the issue. But it is a great Idea. Our 2 CCR in the area are not parallel. They are actually 150 miles apart. We have a layer 3 switch on the mountaintop separating them with OSPF. We don't use ICMP redirects at all.

It looks like this

Frontier fiber-----Blanding CCR ------ Abajo Cisco switch ---------- Pleasant view CCR------ Farmers telco fiber

Blanding is the default gateway for 2 blocks of /24s. Abajo is a default gateway for 2 other /24s. Normally Blanding sends its 2 blocks out frontier and Abajo out Farmers.

Pleasant view OSPF originates a type 1 default route as does Blanding.
The link between the Blanding CCR and Abajo has an inflated cost to make sure the Abajo switch sends its traffic to Pleasant view.

If farmers goes down the Abajo switch will use Frontier as an alternative. The same the other way around.

I don't know how it could be relevant but when the power went out our microwave going from Blanding to Abajo lost power. The microwave is used for management only. Traffic continued to flow out frontier.
It sounds like your Cisco L3 switch (router really) is your core switch in this design. All traffic must pass through there.

I can't correlate how the size of the packets would make a difference other than previously suggested. MTU would be a first thought. A router reboot could correct the issue if there were any number of internal faults in the system. However, a config restore doesn't make a lot of sense to me in most cases.

At this point, I don't think you have enough information to determine the actual cause (at least for me). Mikrotik has a great packet sniffer feature. I'd suggest setting up a system with Wireshark and be ready to turn on the streaming packet capture. Have the problem CCR stream the packets to your PC with Wireshark so you can analyze the traffic.

See https://wiki.mikrotik.com/wiki/Ethereal/Wireshark.

Who is online

Users browsing this forum: avalteri, noxxsan, romrider and 43 guests