Page 1 of 1

Help: Script to Web Cache a website

Posted: Wed Mar 28, 2012 2:30 am
by rviteri
Hi all,

I am looking for a way for the router to connect and cache all content from a website down to four or five sub folders (tree) automatically. As if the router itself were to be browsing and caching content.

For example:

http://www.exampleweb.com loads index.html and within index.html there are links to product1.html, product2.html and so on. The idea is that the router could follow the links and download them.

The website I am talking about posts new information at midnight, 9 am, 12 pm, 3 pm, 6 pm and so on every day and I have a bunch of users that log into it pretty much at the same time. So I want to have a copy of the website ready in the cache before automatically.

I've heard there is a way to download all content from a site in linux using wget -m but I have never tried it.

Thanks!

Re: Help: Script to Web Cache a website

Posted: Fri Mar 30, 2012 8:15 am
by jvr
Take a look at http://wiki.mikrotik.com/wiki/How_to_ma ... _web_proxy. All of the commands for the webproxy can also be found under IP->Webproxy in winbox. I don't know what kind of router you are using, so space may be an issue depending on how large the site you are trying to cache is. Also, if the site changes often, you'll be beating up the NAND storage a lot more than you usually would which may lead to storage failures earlier than you might otherwise see. Obviously if you are running off a hard drive that won't be an issue.

I don't know of any way to pre-cache from the router itself - the simplest way would be to run a scheduled job off a computer behind it using wget (see http://www.dheinemann.com/2011/archiving-with-wget/ for a simple example of mirroring a remote site). I advise caution - make sure your script is gentle (i.e. maybe don't use the mirror option and instead script around it or at least use the wait options nicely) or the site operators might get angry :D