How to Wget - The basics of this great Downloader
Wget is a a powerful, console based network downloader. We are all aware of some good GUI based downloaders, but they would be of no use if you are working on consoles. And once you become aware of what wget is capable of, I am pretty sure you might actually start using it for your day to day needs, even on Graphical Environments.
So, lets begin with the basic syntax.
[shredder12]$ wget http://linuxers.org
For any webpage url, the above command will result in the download of that page(index.html in above case). If its a media(image, audio, video) or any download link, then wget will download the application for you. All of the downloaded stuff will be saved in the current directory.
Wget how to: Download multiple files
For downloading multiple files or weblinks at once just mentions them serially, like this
[shredder12]$ wget
If you have a whole list of download activities then the best method is to store all of them in a file, one URL per line, and give the file as input to wget. Use the --input-file or -i option to do so.
[shredder12]$ wget -i list.txt
The file list.txt should contain the list of URLs. e.g the file list.txt should look something like this
Wget how to: Download files of a particular extension only (e.g all .pdf or all .jpeg files)
If you have used bash, you would know that *.pdf refers all the pdf files. The same convention can be used with wget too.
[shredder12]$ wget ftp://foo.bar/downloads/*.pdf
This should download all the pdf files available in the folder. Please note that, this only works for FTP.
Wget how to: Set the name of output file
Use the --output-document=new_filename or -O to set the output file.
[shredder12]$ wget -O software.tgz
This will rename file.tgz and rename it to software.tgz. This option, -O, is not just for renaming a file. If you are downloading multiple files and mention this option, instead of getting saved as separate files, they will be concatenated into a single one.
Wget how to: set the connection timeout
I mainly use wget in scripts which involves downloading of multiple files and I definitely don't want the download of a single file to become the bottleneck. So, in such cases its better to specify a timeout. The timeout is to tell wget that if its unable to start downloading within that period of time then abort it. It can be easily done using the -T or --timeout flag. It is mostly used with the --tries flag mentioned below
[shredder12]$ wget -T 5
Wget will timeout, if the download doesn't start within 5 seconds. You can even use decimal values, say 2.5
Wget how to: set the number of re-tries wget should do before quitting the download
Similar to the timeout flag, this is useful to prevent a single download from becoming the bottleneck. If a download fails a particular number of times then wget should move on to the next one. This is done using the --tries or -t option.
[shredder12]$ wget --tries=5
If the download fails, it will abort the operation after 5 tries. The default is 20. Use 0 for infinite.
Wget how to: Run wget in background and log all the output in a file
If you want wget to go to background after startup then use the --background or -b flag. Since, the output won't be printed now, the output messages will be logged. We can either mention the logfile or let the messages get logged in the file named wget-log, which is the default behaviour.
[shredder12]$ wget -b http://linuxers.org
Continuing in background, pid 15278.
Output will be written to `wget-log'.
The --output-file or -o options are used to mention the log file.
[shredder12]$ wget -b -o logfile
Wget how to: Control Wget's output
Use the -v or --verbose option to print all the available data, this is the default behaviour. If you want to completely turn off the output, use the -q or --quiet option. In order to turn off verbose, get just the errors and basic info use the -nv or --no-verbose option.
2 Comments
Post new comment