Madhav Kobal's Blog

This blog will be dedicated to Linux, Open Source and Technology news, affairs, how-tos and virtually EVERYTHING in these domains.

Posts Tagged ‘Tips’

Wget Commands You Diden’t Hear About Before

Posted by madhavkobal on 22/10/2009

Wget is a free utility for non-interactive download of files from the Web. It supports http, https, and ftp protocols, as well as retrieval through http proxies. Every linux user use terminal, and using the terminal need from you to have some knowledge of command line, so today i choosed you a list of command line wget that perhaps you diden`t hear about them before.

1- Wget basic command list this surely you did hear see it and use it before:

cd picdir/ && wget -nd -pHEKk http://www.unixmen.com/me.jpg

Mean store the current browsable photo to the current picdir directory

wget -c http://www.unixmen/software.de

Download a file with the ability to stop the download and resume later

wget -r -nd -np -l1 -A ‘*.jpg’ http://www.unixmen.com/dir/

Download a site of files to the current directory

wget -r http://www.example.com
Download an entire website

echo ‘wget -c http://www.example.com/files.iso’ | at 09:00
Start a download at any given time

wget ftp://remote/filex.iso/

The usage of FTP is as simple. Wget will take care of login and password

wget –limit-rate=30k

Limit download of a link in 30 Kb/s

wget -nv –spider –force-html -i bookmarks.html

Check the links in a fi

wget –mirror http://www.example.com/

Update a local copy of a website

2- This wget command save a html page and convert it to a .pdf

wget $URL | htmldoc --webpage -f "$URL".pdf - ; xpdf "$URL".pdf &


3- Wget command to get photos from picasa Album :

wget 'link of a Picasa WebAlbum'
-O - |perl -e'while(<>){while(s/"media":{"content":\[{"url":"(.+?\.JPG)//){print
"$1\n"}}' |wget -w1 -i -


4-Check twitter if you can connect :

wget http://twitter.com/help/test.json -q -O -


5– Wget command to get all the Zips files and Pdf from a website :

wget --reject html,htm --accept pdf,zip -rl1 url

If the website use https then :

wget --reject html,htm --accept pdf,zip -rl1 --no-check-certificate https-url


6- Wget command to check if a remote file exist :

wget --spider -v http://www.server.com/path/file.ext


7- Wget command to download files from Rapideshare primium

wget -c -t 1 --load-cookies ~/.cookies/rapidshare <URL>


8- Wget command to extract a tarball file from a host without local saving :

wget -qO - "http://www.tarball.com/tarball.gz" | tar zxvf -


9- Block known dirty hosts from reaching your machine :

wget -qO - http://infiltrated.net/blacklisted|awk '!/#|[a-z]/&&/./{print "iptables -A INPUT -s "$1" -j DROP"}'

Blacklisted is a compiled list of all known dirty hosts (botnets, spammers, bruteforcers, etc.) which is updated on an hourly basis. This command will get the list and create the rules for you, if you want them automatically blocked, append |sh to the end of the command line. It’s a more practical solution to block all and allow in specifics however, there are many who don’t or can’t do this which is where this script will come in handy. For those using ipfw, a quick fix would be {print “add deny ip from “$1” to any}. Posted in the sample output are the top two entries. Be advised the blacklisted file itself filters out RFC1918 addresses (10.x.x.x, 172.16-31.x.x, 192.168.x.x) however, it is advisable you check/parse the list before you implement the rules


10- Wget command to download the entire website :

wget --random-wait -r -p -e robots=off -U mozilla http://www.example.com

Original Author :  M. Zinoune -Zinovsky-

Posted in Uncategorized | Tagged: , , | Leave a Comment »