Author Topic: scraping websites anonymously,without spooking the site owners?  (Read 3505 times)

0 Members and 1 Guest are viewing this topic.

Offline Poltergeist

  • NULL
  • Posts: 4
  • Cookies: 0
    • View Profile
How to scrap websites anonymous, without spooking the site owners,

I don't want them shutting me out, when they see a spike in requests with the same IP, and redesign things out of defence.

How does the site log my IP, This is where I get confused, is the site loging by ISP IP or the IP i run my code on?

How does my IP end up in there server logs?
where is it coming from?
How to manipulate it on request?

Any help would be great...
Cheers

Offline proxx

  • Avatarception
  • Global Moderator
  • Titan
  • *
  • Posts: 2803
  • Cookies: 256
  • ФФФ
    • View Profile
Re: scraping websites anonymously,without spooking the site owners?
« Reply #1 on: August 13, 2013, 05:34:24 pm »
I have one word for your : TOR
use proxychains and wget and scrape away.
If it gets blocked restart the tor service and you can continue your downloading with a new IP.
Easy as that.
It would be smart to spoof the useragent as wget looks funny and is blocked by some admins.
just read the man page.
:D
*Solved*
« Last Edit: August 13, 2013, 05:35:52 pm by proxx »
Wtf where you thinking with that signature? - Phage.
This was another little experiment *evillaughter - Proxx.
Evilception... - Phage

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: scraping websites anonymously,without spooking the site owners?
« Reply #2 on: August 13, 2013, 06:09:03 pm »
So many questions... I would use HTTrack while fapping to the progress bar because it's that easy to use (with 1 hand).
I can't remember but I think it has some kind of advanced features for limiting threads/connections/time/whatever.

Offline vezzy

  • Royal Highness
  • ****
  • Posts: 771
  • Cookies: 172
    • View Profile
Re: scraping websites anonymously,without spooking the site owners?
« Reply #3 on: August 13, 2013, 07:30:21 pm »
In addition to what proxx said, if you really want to get serious, try strategically timing the requests to make it seem like a regular user, rather than overwhelming the server.
Quote from: Dippy hippy
Just brushing though. I will be semi active mainly came to find a HQ botnet, like THOR or just any p2p botnet

Offline Poltergeist

  • NULL
  • Posts: 4
  • Cookies: 0
    • View Profile
Re: scraping websites anonymously,without spooking the site owners?
« Reply #4 on: August 14, 2013, 02:16:49 pm »
How do I connect to the Tor network from my code?
I don't want to have to install a special browser as apart of the process,
ultimately I want to run cron jobs from my remote server, to do the work.
While masking my the IP,is there anything that can do this?

Offline proxx

  • Avatarception
  • Global Moderator
  • Titan
  • *
  • Posts: 2803
  • Cookies: 256
  • ФФФ
    • View Profile
Re: scraping websites anonymously,without spooking the site owners?
« Reply #5 on: August 14, 2013, 02:56:05 pm »
How do I connect to the Tor network from my code?
I don't want to have to install a special browser as apart of the process,
ultimately I want to run cron jobs from my remote server, to do the work.
While masking my the IP,is there anything that can do this?

Please read my previous post.
Use proxychains that can pipe all traffic into a tunnel, in this case the TOR socks.
You might leak DNS but since thats not the owner that should be no big deal.
As I said read the man page.
« Last Edit: August 14, 2013, 02:56:26 pm by proxx »
Wtf where you thinking with that signature? - Phage.
This was another little experiment *evillaughter - Proxx.
Evilception... - Phage

Offline m0l0ko

  • Peasant
  • *
  • Posts: 129
  • Cookies: -4
    • View Profile
Re: scraping websites anonymously,without spooking the site owners?
« Reply #6 on: September 02, 2013, 01:48:26 pm »
I just put a time delay in between each page request so as not to cause concern for the admin. If I wanted to avoid IP blocks, I'd just make the script restart my router every so many requests. You could randomize the time delays to make it look less like a bot. You could also feed a list of proxies into your script.
« Last Edit: September 02, 2013, 01:48:54 pm by m0l0ko »

Offline kenjoe41

  • Symphorophiliac Programmer
  • Administrator
  • Baron
  • *
  • Posts: 990
  • Cookies: 224
    • View Profile
Re: scraping websites anonymously,without spooking the site owners?
« Reply #7 on: September 02, 2013, 11:26:11 pm »
If i was codinga script to do this, Libcurlwould be all i need cos it provides for alot of protocol conectivity and proxy tunneling which would let me s
mask my IP.
If you can't explain it to a 6 year old, you don't understand it yourself.
http://upload.alpha.evilzone.org/index.php?page=img&img=GwkGGneGR7Pl222zVGmNTjerkhkYNGtBuiYXkpyNv4ScOAWQu0-Y8[<NgGw/hsq]>EvbQrOrousk[/img]

Offline vezzy

  • Royal Highness
  • ****
  • Posts: 771
  • Cookies: 172
    • View Profile
Re: scraping websites anonymously,without spooking the site owners?
« Reply #8 on: September 02, 2013, 11:35:24 pm »
If i was codinga script to do this, Libcurlwould be all i need cos it provides for alot of protocol conectivity and proxy tunneling which would let me s
mask my IP.

Without denying the power of cURL, tunneling your bot through an HTTP or SOCKS proxy is pretty trivial using the standard libraries of most high-level languages, especially Python and Ruby which have special proxy handlers.
Quote from: Dippy hippy
Just brushing though. I will be semi active mainly came to find a HQ botnet, like THOR or just any p2p botnet