Author Topic: Download all ebooks from it-ebooks.info (Read 6089 times)

xtream1101 · « **on:** October 24, 2014, 03:27:03 pm »

I created this script to download all of the ebooks from it-ebooks.info. This may not be useful to many as it will download them all. I just like having offline copies of everything.

Here is the GitHub repo with the most up to date code: https://github.com/xtream1101/it-ebooks-dl

Edit: took out code block, most recent script is linked above.

TheWormKill · « **Reply #1 on:** October 24, 2014, 03:36:30 pm »

the time I did something similar, I used requests in combination with HTMLParser, works better, you don't need to do the low-level work and are able to create sessions etc. easily. Makes the code a lot cleaner.

xtream1101 · « **Reply #2 on:** October 24, 2014, 03:41:46 pm »

I did read about requests when making this script, but I have been trying to keep my scripts as dependent as possible.

s3my0n · « **Reply #3 on:** October 24, 2014, 04:47:31 pm »

You should automatically move them to directories based on the topic.

xtream1101 · « **Reply #4 on:** October 24, 2014, 04:53:47 pm »

Quote from: s3my0n on October 24, 2014, 04:47:31 pm

You should automatically move them to directories based on the topic.

For right now that will have to wait as I cannot get the topic of the book from the page I am parsing (http://it-ebooks.info/book/1/). But I will find a way to look up the book and get that info when I get time. For now I will make them go into their publisher directory.

Nortcele · « **Reply #5 on:** October 24, 2014, 05:17:01 pm »

You should add in some organization

TheWormKill · « **Reply #6 on:** October 24, 2014, 05:34:27 pm »

@OP: Considering the fact that most users will be on *NIX it's not a problem to install one Python lib, but reading less clean and complicated code is. My own opinion though.

xtream1101 · « **Reply #7 on:** October 24, 2014, 06:12:57 pm »

Just so I can understand better, what part of this code is "complicated?"
Also I don't see how using requests will make the code any shorter, it would just be used in place of where I use urllib woulden't it?

TheWormKill · « **Reply #8 on:** October 24, 2014, 07:13:03 pm »

Yes, and urllib is overly complicated, comparing it to requests. Don't get me wrong, but code written with requests is easier to read, since the API is more intuitive. For instance, you wouldn't need to deal with headers if you just want to download files, which would make the code shorter. Anyway, why the f*ck are we arguing about it? If you don't want to use it, don't, it was merely a suggestion.

PS: This sounds like an ad for requests, doesn't it?

Fur · « **Reply #9 on:** October 24, 2014, 07:38:42 pm »

it-ebooks has an API. I would have used that instead. After all, JSON is easier to parse, the API returns only the data we need, and the entire point in the API was to help other programs get content from it-ebooks.
It would just be a matter of calling http://it-ebooks-api.info/v1/book/$i for all $i from 0 to the id of the last uploaded book and parsing the result.

xtream1101 · « **Reply #10 on:** October 24, 2014, 07:43:31 pm »

Quote from: Fur on October 24, 2014, 07:38:42 pm

it-ebooks has an API. I would have used that instead. After all, JSON is easier to parse, the API returns only the data we need, and the entire point in the API was to help other programs get content from it-ebooks.
It would just be a matter of calling http://it-ebooks-api.info/v1/book/$i for all $i from 0 to the id of the last uploaded book and parsing the result.

I was going to do that but something seems to be broken with it.
When i do http://it-ebooks-api.info/v1/book/1 it should give me the book that is at http://it-ebooks.info/book/1/.
But instead I get {"Error":"Book not found!"}

Fur · « **Reply #11 on:** October 24, 2014, 07:55:23 pm »

Quote from: xtream1101 on October 24, 2014, 07:43:31 pm

I was going to do that but something seems to be broken with it.
When i do http://it-ebooks-api.info/v1/book/1 it should give me the book that is at http://it-ebooks.info/book/1/.
But instead I get {"Error":"Book not found!"}

Yes, I see what you mean. I called http://it-ebooks-api.info/v1/search/A%20Peek%20at%20Computer%20Electronics and here's part of the response:

Code: [Select]

{"ID":425866405,"Title":"A Peek at Computer Electronics","SubTitle":"Things you Should Know","Description":"Are you a programmer or computer enthusiast? Do you feel comfortable with methods, functions, and variables? Do you wish you knew more about how the computer made it all work? Now you can. From basic electronics to advanced computer hardware, you'll ...","Image":"http://s.it-ebooks-api.info/1/a_peek_at_computer_electronics.jpg","isbn":"9780977616688"}

The id is not 1, and all of them seem to be random, or at least not in order.

So yes, it would actually seem that using the web interface would be the best way to download them all. I should have verified that the API worked as expected before posting.

Edit:
Decided to make my own. Suboptimal but it works.

Code: (Ruby) [Select]

# it-ebooks-download.rb
# Download all ebooks from it-ebooks.info to the current directory.

# TODO:
  # Add option to download to different directory.

require 'mechanize'

BOOK_BASE_URL = "http://it-ebooks.info/book/%s/"
MAX_BOOK_ID = Float::INFINITY

agent = Mechanize.new
agent.user_agent_alias = "Windows IE 7"
agent.pluggable_parser.default = Mechanize::Download

(1..MAX_BOOK_ID).each do |i|
  page = nil
  begin
    page = agent.get(BOOK_BASE_URL % [i])
  rescue Mechanize::ResponseCodeError, Net::HTTPNotFound
    # TODO: Make sure this only catches 404s.
    abort "HTTP 404 given. All books must have been downloaded."
  end

  # FIXME: I think I remember seeing books hosted on it-ebooks. Change this to a unique selector just in case.
  download_link = page.link_with(:href => /filepi.com/)

  agent.get(download_link.uri).save
  puts download_link.text + " downloaded!"
end

Spacecow · « **Reply #12 on:** October 26, 2014, 07:42:48 am »

Actually I have been procrastinating on writing a script that does just what this one does so +1 for doing work so I don't have to

Nortcele · « **Reply #13 on:** October 28, 2014, 03:34:11 pm »

Quote from: Spacecow on October 26, 2014, 07:42:48 am

Actually I have been procrastinating on writing a script that does just what this one does so +1 for doing work so I don't have to

Ll

xtream1101 · « **Reply #14 on:** October 29, 2014, 05:04:30 pm »

I re codded about the whole script to work better.
Now when the site is parsed, it will save all the book/page data to a .json file. When Parsed again, it will only pars what the .json file is missing, as in it will get anything new

The download section will use the .json file for all of its needs.

I changed the way it worked so we are not sending un-needed traffic.

EvilZone

News:

Author Topic: Download all ebooks from it-ebooks.info (Read 6089 times)

xtream1101

Download all ebooks from it-ebooks.info

TheWormKill

Re: Download all ebboks from it-ebooks.info

xtream1101

Re: Download all ebboks from it-ebooks.info

s3my0n

Re: Download all ebboks from it-ebooks.info

xtream1101

Re: Download all ebboks from it-ebooks.info

Nortcele

Re: Download all ebboks from it-ebooks.info

TheWormKill

Re: Download all ebboks from it-ebooks.info

xtream1101

Re: Download all ebboks from it-ebooks.info

TheWormKill

Re: Download all ebboks from it-ebooks.info

Fur

Re: Download all ebboks from it-ebooks.info

xtream1101

Re: Download all ebboks from it-ebooks.info

Fur

Re: Download all ebboks from it-ebooks.info

Spacecow

Re: Download all ebboks from it-ebooks.info

Nortcele

Re: Download all ebboks from it-ebooks.info

xtream1101

Re: Download all ebooks from it-ebooks.info