EvilZone
Programming and Scripting => Scripting Languages => : proxx June 10, 2013, 08:35:15 PM
-
Hello,
This is not really bash but whatever, dont know where to put it.
Wget is a very useful tool and for those unaware of its power I would like to show how to download a website, partially or completely.
One thing that is useful is the option to ignore robots.txt
wget -H -r --level=5 -e robots=off http://somewebsite.org/
-H --span-hosts go to foreign hosts when recursive.
-r --recursive --level=5 recursive + recursion dept, in this case 5
-e robots=off execute + ignore robots.txt
This will download the entire webpage including files etc.
Must have used it a million times. screw GUI tools :P
Have a nice day.
-
Yeah, I'm pretty sure every *nix user is aware of this. Even some more advanced Windoo users too.
-
If you use link depth of 5, does wget download stuff outside the root website too?
Example: let's say there is a link to some blog post, does wget check if the link goes out of the website to another and skip or does wget download everything on the blog too?
-
If you use link depth of 5, does wget download stuff outside the root website too?
Example: let's say there is a link to some blog post, does wget check if the link goes out of the website to another and skip or does wget download everything on the blog too?
My mistake its recursion dept.
Not link dept.
Yeah, I'm pretty sure every *nix user is aware of this. Even some more advanced Windoo users too.
Me too but for those new to *nix it might be a nice example of its power.
-
wget has a --spider switch too. So it don't downloads anything and you can go through the output and look if you find some interesting. But i think you already know. But i have to get posts :-))))