11 March 2007

Mirroring Web Sites


Mirroring a web site, or subset thereof, is actually very quick and easy. All you need is the handy utility wget. There are actually several good reasons for wanting to do this.

  • Snap-shotting a compromised web site for off-line analysis

  • Snap-shotting a healthy web site for off-line analysis

  • Setting up a decoy or dummy website

  • Legitimate mirroring to help someone else out with bandwidth

wget has quite a few options. Read the man page if you care. If not, here is a sample command to snatch a web site domain.tld.

wget --mirror --wait=2 --random-wait --force-directories --recursive --convert-links --page-requisites –domains=domain.tld http://domain.tld/

Most of the options above are the long form so that you can understand what they do without me having to explain each one. One pair worth noting is -wait=2 -random-wait.

Plenty of web hosts and administrators run statistical analysis on their logs. You don't want to set off any alarms, even if your intentions are pure. If some overzealous administrator sees some idiot beating the hell out of their website, they may decided to teach the wannabe DoSing bastard a lesson and phone the feds. The two above options are an attempt at keeping your full footprint in the logs from being noticed.

--wait=2 sets a pause between page fetches of 2 seconds, and -random-wait skews this by 0 to 200% per request. Logs will still show a lot of hits within a short time frame, but hopefully you will avoid some flagging thresholds. These two options will also help you dodge DoS filters as well.

No comments: