Community discussions

MikroTik App
 
rviteri
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 85
Joined: Fri Nov 18, 2011 5:53 pm

Help: Script to Web Cache a website

Wed Mar 28, 2012 2:30 am

Hi all,

I am looking for a way for the router to connect and cache all content from a website down to four or five sub folders (tree) automatically. As if the router itself were to be browsing and caching content.

For example:

http://www.exampleweb.com loads index.html and within index.html there are links to product1.html, product2.html and so on. The idea is that the router could follow the links and download them.

The website I am talking about posts new information at midnight, 9 am, 12 pm, 3 pm, 6 pm and so on every day and I have a bunch of users that log into it pretty much at the same time. So I want to have a copy of the website ready in the cache before automatically.

I've heard there is a way to download all content from a site in linux using wget -m but I have never tried it.

Thanks!
 
User avatar
jvr
just joined
Posts: 10
Joined: Tue May 17, 2011 7:12 pm

Re: Help: Script to Web Cache a website

Fri Mar 30, 2012 8:15 am

Take a look at http://wiki.mikrotik.com/wiki/How_to_ma ... _web_proxy. All of the commands for the webproxy can also be found under IP->Webproxy in winbox. I don't know what kind of router you are using, so space may be an issue depending on how large the site you are trying to cache is. Also, if the site changes often, you'll be beating up the NAND storage a lot more than you usually would which may lead to storage failures earlier than you might otherwise see. Obviously if you are running off a hard drive that won't be an issue.

I don't know of any way to pre-cache from the router itself - the simplest way would be to run a scheduled job off a computer behind it using wget (see http://www.dheinemann.com/2011/archiving-with-wget/ for a simple example of mirroring a remote site). I advise caution - make sure your script is gentle (i.e. maybe don't use the mirror option and instead script around it or at least use the wait options nicely) or the site operators might get angry :D