Author Topic: Download all ebooks from it-ebooks.info  (Read 5178 times)

0 Members and 3 Guests are viewing this topic.

Offline xtream1101

  • /dev/null
  • *
  • Posts: 10
  • Cookies: 3
    • View Profile
    • Github
Download all ebooks from it-ebooks.info
« on: October 24, 2014, 03:27:03 pm »
I created this script to download all of the ebooks from it-ebooks.info. This may not be useful to many as it will download them all. I just like having offline copies of everything.

Here is the GitHub repo with the most up to date code: https://github.com/xtream1101/it-ebooks-dl

Edit: took out code block, most recent script is linked above.
« Last Edit: October 26, 2014, 02:58:33 pm by xtream1101 »

Offline TheWormKill

  • EZ's Scripting Whore
  • Global Moderator
  • Knight
  • *
  • Posts: 257
  • Cookies: 66
  • The Grim Reaper of Worms
    • View Profile
Re: Download all ebboks from it-ebooks.info
« Reply #1 on: October 24, 2014, 03:36:30 pm »
the time I did something similar, I used requests in combination with HTMLParser, works better, you don't need to do the low-level work and are able to create sessions etc. easily. Makes the code a lot cleaner.
Stuff I did: How to think like a superuser, Iridium

He should make that "Haskell"
Quote
<m0rph-is-gay> fuck you thewormkill you python coding mother fucker

Offline xtream1101

  • /dev/null
  • *
  • Posts: 10
  • Cookies: 3
    • View Profile
    • Github
Re: Download all ebboks from it-ebooks.info
« Reply #2 on: October 24, 2014, 03:41:46 pm »
I did read about requests when making this script, but I have been trying to keep my scripts as dependent as possible.

Offline s3my0n

  • Knight
  • **
  • Posts: 276
  • Cookies: 58
    • View Profile
    • ::1
Re: Download all ebboks from it-ebooks.info
« Reply #3 on: October 24, 2014, 04:47:31 pm »
You should automatically move them to directories based on the topic.
Easter egg in all *nix systems: E(){ E|E& };E

Offline xtream1101

  • /dev/null
  • *
  • Posts: 10
  • Cookies: 3
    • View Profile
    • Github
Re: Download all ebboks from it-ebooks.info
« Reply #4 on: October 24, 2014, 04:53:47 pm »
You should automatically move them to directories based on the topic.

For right now that will have to wait as I cannot get the topic of the book from the page I am parsing (http://it-ebooks.info/book/1/). But I will find a way to look up the book and get that info when I get time. For now I will make them go into their publisher directory.


Offline Nortcele

  • Knight
  • **
  • Posts: 211
  • Cookies: -42
  • █+█=██
    • View Profile
Re: Download all ebboks from it-ebooks.info
« Reply #5 on: October 24, 2014, 05:17:01 pm »
You should add in some organization
~JaySec
~LulzBlog

TAKE A COOKIE!




0100000101010011010000110100100101001001

Offline TheWormKill

  • EZ's Scripting Whore
  • Global Moderator
  • Knight
  • *
  • Posts: 257
  • Cookies: 66
  • The Grim Reaper of Worms
    • View Profile
Re: Download all ebboks from it-ebooks.info
« Reply #6 on: October 24, 2014, 05:34:27 pm »
@OP: Considering the fact that most users will be on *NIX it's not a problem to install one Python lib, but reading less clean and complicated code is. My own opinion though.
Stuff I did: How to think like a superuser, Iridium

He should make that "Haskell"
Quote
<m0rph-is-gay> fuck you thewormkill you python coding mother fucker

Offline xtream1101

  • /dev/null
  • *
  • Posts: 10
  • Cookies: 3
    • View Profile
    • Github
Re: Download all ebboks from it-ebooks.info
« Reply #7 on: October 24, 2014, 06:12:57 pm »
Just so I can understand better, what part of this code is "complicated?"
Also I don't see how using requests will make the code any shorter, it would just be used in place of where I use urllib woulden't it?

Offline TheWormKill

  • EZ's Scripting Whore
  • Global Moderator
  • Knight
  • *
  • Posts: 257
  • Cookies: 66
  • The Grim Reaper of Worms
    • View Profile
Re: Download all ebboks from it-ebooks.info
« Reply #8 on: October 24, 2014, 07:13:03 pm »
Yes, and urllib is overly complicated, comparing it to requests. Don't get me wrong, but code written with requests is easier to read, since the API is more intuitive. For instance, you wouldn't need to deal with headers if you just want to download files, which would make the code shorter. Anyway, why the f*ck are we arguing about it? If you don't want to use it, don't, it was merely a suggestion.

PS: This sounds like an ad for requests, doesn't it?
« Last Edit: October 24, 2014, 07:14:49 pm by TheWormKill »
Stuff I did: How to think like a superuser, Iridium

He should make that "Haskell"
Quote
<m0rph-is-gay> fuck you thewormkill you python coding mother fucker

Offline Fur

  • Knight
  • **
  • Posts: 216
  • Cookies: 34
    • View Profile
Re: Download all ebboks from it-ebooks.info
« Reply #9 on: October 24, 2014, 07:38:42 pm »
it-ebooks has an API. I would have used that instead. After all, JSON is easier to parse, the API returns only the data we need, and the entire point in the API was to help other programs get content from it-ebooks.
It would just be a matter of calling http://it-ebooks-api.info/v1/book/$i for all $i from 0 to the id of the last uploaded book and parsing the result.

Offline xtream1101

  • /dev/null
  • *
  • Posts: 10
  • Cookies: 3
    • View Profile
    • Github
Re: Download all ebboks from it-ebooks.info
« Reply #10 on: October 24, 2014, 07:43:31 pm »
it-ebooks has an API. I would have used that instead. After all, JSON is easier to parse, the API returns only the data we need, and the entire point in the API was to help other programs get content from it-ebooks.
It would just be a matter of calling http://it-ebooks-api.info/v1/book/$i for all $i from 0 to the id of the last uploaded book and parsing the result.

I was going to do that but something seems to be broken with it.
When i do http://it-ebooks-api.info/v1/book/1 it should give me the book that is at http://it-ebooks.info/book/1/.
But instead I get {"Error":"Book not found!"}

Offline Fur

  • Knight
  • **
  • Posts: 216
  • Cookies: 34
    • View Profile
Re: Download all ebboks from it-ebooks.info
« Reply #11 on: October 24, 2014, 07:55:23 pm »
I was going to do that but something seems to be broken with it.
When i do http://it-ebooks-api.info/v1/book/1 it should give me the book that is at http://it-ebooks.info/book/1/.
But instead I get {"Error":"Book not found!"}
Yes, I see what you mean. I called http://it-ebooks-api.info/v1/search/A%20Peek%20at%20Computer%20Electronics and here's part of the response:
Code: [Select]
{"ID":425866405,"Title":"A Peek at Computer Electronics","SubTitle":"Things you Should Know","Description":"Are you a programmer or computer enthusiast? Do you feel comfortable with methods, functions, and variables? Do you wish you knew more about how the computer made it all work? Now you can. From basic electronics to advanced computer hardware, you'll ...","Image":"http://s.it-ebooks-api.info/1/a_peek_at_computer_electronics.jpg","isbn":"9780977616688"}
The id is not 1, and all of them seem to be random, or at least not in order.

So yes, it would actually seem that using the web interface would be the best way to download them all. I should have verified that the API worked as expected before posting.

Edit:
Decided to make my own. Suboptimal but it works.
Code: (Ruby) [Select]
# it-ebooks-download.rb
# Download all ebooks from it-ebooks.info to the current directory.

# TODO:
  # Add option to download to different directory.

require 'mechanize'

BOOK_BASE_URL = "http://it-ebooks.info/book/%s/"
MAX_BOOK_ID = Float::INFINITY

agent = Mechanize.new
agent.user_agent_alias = "Windows IE 7"
agent.pluggable_parser.default = Mechanize::Download

(1..MAX_BOOK_ID).each do |i|
  page = nil
  begin
    page = agent.get(BOOK_BASE_URL % [i])
  rescue Mechanize::ResponseCodeError, Net::HTTPNotFound
    # TODO: Make sure this only catches 404s.
    abort "HTTP 404 given. All books must have been downloaded."
  end

  # FIXME: I think I remember seeing books hosted on it-ebooks. Change this to a unique selector just in case.
  download_link = page.link_with(:href => /filepi.com/)

  agent.get(download_link.uri).save
  puts download_link.text + " downloaded!"
end
« Last Edit: October 25, 2014, 10:46:10 pm by Fur »

Spacecow

  • Guest
Re: Download all ebboks from it-ebooks.info
« Reply #12 on: October 26, 2014, 07:42:48 am »
Actually I have been procrastinating on writing a script that does just what this one does so +1 for doing work so I don't have to :D

Offline Nortcele

  • Knight
  • **
  • Posts: 211
  • Cookies: -42
  • █+█=██
    • View Profile
Re: Download all ebboks from it-ebooks.info
« Reply #13 on: October 28, 2014, 03:34:11 pm »
Actually I have been procrastinating on writing a script that does just what this one does so +1 for doing work so I don't have to :D
Ll
~JaySec
~LulzBlog

TAKE A COOKIE!




0100000101010011010000110100100101001001

Offline xtream1101

  • /dev/null
  • *
  • Posts: 10
  • Cookies: 3
    • View Profile
    • Github
Re: Download all ebooks from it-ebooks.info
« Reply #14 on: October 29, 2014, 05:04:30 pm »
I re codded about the whole script to work better.
Now when the site is parsed, it will save all the book/page data to a .json file. When Parsed again, it will only pars what the .json file is missing, as in it will get anything new

The download section will use the .json file for all of its needs.

I changed the way it worked so we are not sending un-needed traffic.