EvilZone

Hacking and Security => Anonymity and Privacy => : Poltergeist August 13, 2013, 05:29:36 PM

: scraping websites anonymously,without spooking the site owners?
: Poltergeist August 13, 2013, 05:29:36 PM
How to scrap websites anonymous, without spooking the site owners,

I don't want them shutting me out, when they see a spike in requests with the same IP, and redesign things out of defence.

How does the site log my IP, This is where I get confused, is the site loging by ISP IP or the IP i run my code on?

How does my IP end up in there server logs?
where is it coming from?
How to manipulate it on request?

Any help would be great...
Cheers
: Re: scraping websites anonymously,without spooking the site owners?
: proxx August 13, 2013, 05:34:24 PM
I have one word for your : TOR
use proxychains and wget and scrape away.
If it gets blocked restart the tor service and you can continue your downloading with a new IP.
Easy as that.
It would be smart to spoof the useragent as wget looks funny and is blocked by some admins.
just read the man page.
:D
*Solved*
: Re: scraping websites anonymously,without spooking the site owners?
: Kulverstukas August 13, 2013, 06:09:03 PM
So many questions... I would use HTTrack while fapping to the progress bar because it's that easy to use (with 1 hand).
I can't remember but I think it has some kind of advanced features for limiting threads/connections/time/whatever.
: Re: scraping websites anonymously,without spooking the site owners?
: vezzy August 13, 2013, 07:30:21 PM
In addition to what proxx said, if you really want to get serious, try strategically timing the requests to make it seem like a regular user, rather than overwhelming the server.
: Re: scraping websites anonymously,without spooking the site owners?
: Poltergeist August 14, 2013, 02:16:49 PM
How do I connect to the Tor network from my code?
I don't want to have to install a special browser as apart of the process,
ultimately I want to run cron jobs from my remote server, to do the work.
While masking my the IP,is there anything that can do this?
: Re: scraping websites anonymously,without spooking the site owners?
: proxx August 14, 2013, 02:56:05 PM
How do I connect to the Tor network from my code?
I don't want to have to install a special browser as apart of the process,
ultimately I want to run cron jobs from my remote server, to do the work.
While masking my the IP,is there anything that can do this?

Please read my previous post.
Use proxychains that can pipe all traffic into a tunnel, in this case the TOR socks.
You might leak DNS but since thats not the owner that should be no big deal.
As I said read the man page.
: Re: scraping websites anonymously,without spooking the site owners?
: m0l0ko September 02, 2013, 01:48:26 PM
I just put a time delay in between each page request so as not to cause concern for the admin. If I wanted to avoid IP blocks, I'd just make the script restart my router every so many requests. You could randomize the time delays to make it look less like a bot. You could also feed a list of proxies into your script.
: Re: scraping websites anonymously,without spooking the site owners?
: kenjoe41 September 02, 2013, 11:26:11 PM
If i was codinga script to do this, Libcurlwould be all i need cos it provides for alot of protocol conectivity and proxy tunneling which would let me s
mask my IP.
: Re: scraping websites anonymously,without spooking the site owners?
: vezzy September 02, 2013, 11:35:24 PM
If i was codinga script to do this, Libcurlwould be all i need cos it provides for alot of protocol conectivity and proxy tunneling which would let me s
mask my IP.

Without denying the power of cURL, tunneling your bot through an HTTP or SOCKS proxy is pretty trivial using the standard libraries of most high-level languages, especially Python and Ruby which have special proxy handlers.