unix

How can I download a website using wget on my Linux desktop or server?

wget is a great tool for downloading resources from the Internet.

For basic usage, the syntax is is wget http://mysite.com.

wget http://mysite.com

wget also allows you to recursively download sites, meaning you also get all pages and images and other data linked from the homepage. To recursively download, use wget -r.

wget -r http://mysite.com

Many sites do not want you to download their entire site, so they check to see what browser you’re using, however there is a -U option to handle that. wget -U browser-name tells the webserver that you’re using that browser.

wget -r -p -U Mozilla http://www.mysite.com

Some important command line options are –limit-rate= and –wait=. You can add –wait=20 to pause 20 seconds between retrievals. This will help to ensure that you are not manually added to a blacklist. –limit-rate defaults to bytes, add K to set KB/s.

wget –wait=20 –limit-rate=20K -r -p -U Mozilla http://www.mysite.com

A web-site owner will probably get upset if you attempt to download his entire site using a simple wget http://mysite.com command, however, they will likely not even notice you if you limit the download transfer rate and pause between fetching files.

–no-parent is also a very handy option that guarantees wget will not download anything from the folders beneath the folder you want to acquire. Use this to make sure wget does not fetch more than it needs to if just just want to download the files in a folder.

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

To Top