I came across this article on hackthissite:
https://www.hackthissite.org/forums/viewtopic.php?f=104&t=10334&sid=d41da93c989495cdde82020036725157In the article some guy claims to have written a python script to scan the "deep web". All it does is constantly generate random IPs and attempting connections on port 80. I figured I could use this as a base for a script that actually digs up "hidden" websites.
[gist]anonymous/993898b8121a597f6ca0[/gist]
My script generates random IPs and does a reverse DNS lookup on them. If the lookup FAILS, it attempts connection on port 80. If THIS is a success, then it most likely exists a website on this IP that doesn't have a domain (it's sort of "hidden").
Now, I didn't think that the script would have so much when I started out, and the result is VERY messy code. But it works
!
Includes functionality for doing one single IP instead of constantly spamming them, logging the findings to a file, searching for keywords on the sites that turns up, and different degrees of verbosity.
Finally, here's some things that I need help with:
I use the socket.setdefaulttimeout() function to have one second timeout at the connection (for preformance reasons). BUT, it turns out that socket.gethostbyaddr() doesn't care what the timeout is. It only works for socket.connect(). HOW can I set timeout for gethostbyaddr??
Also, I am not satisfied by randomly generating IPs to scan. My goal is to fill a list with EVERY IP address from 1.1.1.2 to 254.255.255.254, then shuffle the list to randomize the scanning order. But I haven't found a smart way to fill the list yet.
Any and all improvements and suggestions are welcomed