Author Topic: [python] A downloading script for http://musicmp3spb.org/  (Read 968 times)

0 Members and 1 Guest are viewing this topic.

Offline TheWormKill

  • EZ's Scripting Whore
  • Global Moderator
  • Knight
  • *
  • Posts: 257
  • Cookies: 66
  • The Grim Reaper of Worms
    • View Profile
[python] A downloading script for http://musicmp3spb.org/
« on: July 01, 2015, 02:35:37 pm »
Hey everyone,

since I am an old-fashioned idiot person, I am still downloading mp3's. And being even more old-fashioned cool, I also
listen to full albums. The website in the subject is a pretty good archive of different music, but has a few drawbacks:
1. It's in Russian, so most people here can't read it.
2. You need to download every single song by hand, which sucks for me, since it takes 5(!) actions (see first paragraph).

Thus, I wrote a python-script that gets you the music without the need to open a browser.
It allows you to search for an artist and select an album, which gets downloaded to disk afterwards. Simple and concise.

It uses requests and BeautifulSoup4 to get the content and parse the HTML, so make sure you have those installed.

Here's a sample session with some fine Metal:

Quote
$ ./music_downloader.py
enter your search query> Mastodon
your query matched:
0: Mastodon
1: Mastodon & ZZ Top
2: ZZ Top / Mastodon
enter index of artist> 0
0: 2011 Black Tongue [Single]
1: 2011 Curl Of The Burl [Single]
2: 2011 Live At The Aragon
3: 2011 Spectrelight [Single]
4: 2011 The Hunter
5: 2011 The Hunter [Deluxe Edition]
6: 2010 Jonah Hex [EP]
7: 2009 Crack The Skye
8: 2006 Blood Mountain
9: 2006 Call Of The Mastodon
10: 2006 The Wolf Is Loose [EP]
11: 2004 Leviathan
12: 2002 Remission
13: 2001 Lifesblood [EP]
14: 2000 Demo
enter index of album> 11
songs to be downloaded:
0: Blood And Thunder
1: I Am Ahab
2: Seabeast
3: Island
4: Iron Tusk
5: Megalodon
6: Naked Burn
7: Aqua Dementia
8: Hearts Alive
9: Joseph Merrick
preparing for download of file #0... done.
downloading file #0 from http://tempfile.ru/download/c833aabb61f035a00aef7f3c86e6b642 done.
preparing for download of file #1... done.
downloading file #1 from http://tempfile.ru/download/f959975b1b5536dc6d1018c446e023bd done.
preparing for download of file #2... done.
downloading file #2 from http://tempfile.ru/download/92d35578636c4b40b8a24440125bd8cd done.
preparing for download of file #3... done.
downloading file #3 from http://tempfile.ru/download/951305ba17df6a2f2e4b3263c98ec417 done.
preparing for download of file #4... done.
downloading file #4 from http://tempfile.ru/download/5bc20711ea4e04dccfdf38f26827ab00 done.
preparing for download of file #5... done.
downloading file #5 from http://tempfile.ru/download/26163a106f12f3569a85d9116564a488 done.
preparing for download of file #6... done.
downloading file #6 from http://tempfile.ru/download/1e2b1a65cc2c752c5db5e7461d51908f done.
preparing for download of file #7... done.
downloading file #7 from http://tempfile.ru/download/afbfd4e64f1c3850f2cbc7f87451f1de done.
preparing for download of file #8... done.
downloading file #8 from http://tempfile.ru/download/11cc12d2aaa8b418acbe2bb9901dc386 done.
preparing for download of file #9... done.
downloading file #9 from http://tempfile.ru/download/82ad8c46a0c3ab10dad4750369829d17 done.
task completed. exiting.

The output's not perfect and there might be bugs, but it serves it's purpose well enough.

The code:
Code: (Python) [Select]
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

import os


class Artist:
    def __init__(self, name, url):
        self.name = name
        self.url = 'http://musicmp3spb.org' + url

    def __str__(self):
        return self.name


class Album:
    def __init__(self, artist, name, url):
        self.artist = artist
        self.name = name
        self.url = 'http://musicmp3spb.org' + url

    def __str__(self):
        return self.name


class Downloader:
    def __init__(self):
        self.artists = {}
        self.albums = {}

    def find_artists(self, search_string):
        txt = requests.get(
            'http://musicmp3spb.org/search/?Content=%s&category=3'
            % search_string.replace(' ', '+')).content
        soup = BeautifulSoup(txt)
        links = []
        artist_names = []
        for link in soup.find_all('a'):
            not_top10 = True
            for parent in link.parents:
                c = parent.get('class')
                if c and 'tTop10' in c:
                    not_top10 = False
            if not_top10 and link.get('href').startswith('/artist/'):
                artist_names.append(link.get_text())
                links.append(link.get('href'))
        for index, artist in enumerate(artist_names):
            self.artists[index] = Artist(artist, links[index])

    def find_album(self, artist_index):
        artist_index = int(artist_index)
        artist = self.artists[artist_index]
        txt = requests.get(artist.url).content
        soup = BeautifulSoup(txt)
        links = []
        album_names = []
        for link in soup.find_all('a'):
            if link.get('href').startswith('/album/') and link.get_text():
                album_names.append(link.get_text())
                links.append(link.get('href'))
        for index, album in enumerate(album_names):
            self.albums[index] = Album(artist.name, album, links[index])

    def download_album(self, album_index):
        album_index = int(album_index)
        album = self.albums[album_index]
        txt = requests.get(album.url).content
        soup = BeautifulSoup(txt)
        links = []
        song_names = []
        for link in soup.find_all('a'):
            not_top10 = True
            for parent in link.parents:
                c = parent.get('class')
                if c and 'tTop10' in c:
                    not_top10 = False
            if not_top10 and link.get(
                    'href').startswith('/download/') and link.get_text():
                song_names.append(link.get_text())
                links.append('http://musicmp3spb.org' + link.get('href'))
        print 'songs to be downloaded:'
        for index, name in enumerate(song_names):
            print str(index)+':', name
        d = 'downloads/%s/%s' % (album.artist, album.name)
        if not os.path.exists(d):
            print 'creating directory', d + '...'
            os.makedirs(d)
        for num, link in enumerate(links):
            print 'preparing for download of file #' + str(num) + '...',
            req1 = requests.get(link)
            soup = BeautifulSoup(req1.text)
            for form in soup.find_all('form'):
                if form.get('action') != '/member/':
                    url = 'http://tempfile.ru' + form.get('action')
                    for child in form.descendants:
                        if child.name == 'input' and \
                                child.get('type') == 'hidden':
                            hidden_input_name = child.get('name')
                            hidden_input_value = child.get('value')
            values = {hidden_input_name: hidden_input_value}
            req2 = requests.post(url, data=values)
            soup = BeautifulSoup(req2.text)
            for a in soup.find_all('a'):
                if a.get('href').startswith('http://tempfile.ru/download/'):
                    link3 = a.get('href')
            print 'done.'
            if not link3 == '':
                print 'downloading file #' + str(num) + ' from', link3,
                downloaded_file = requests.get(link3)
                file1 = open('downloads/' + album.artist + '/' +
                             album.name + '/' + str(num) + '-' +
                             song_names[num] + '.mp3', 'wb')
                file1.write(downloaded_file.content)
                file1.close()
                print 'done.'
            else:
                print 'link invalid, download impossible:', link3 + '!'
        print 'task completed. exiting.'

if __name__ == '__main__':
    d = Downloader()
    d.find_artists(raw_input('enter your search query> '))
    print 'your query matched:'
    for i in d.artists:
        try:
            print str(i) + ':', d.artists[i]
        except:
            print str(i) + ': not printable'
    d.find_album(raw_input('enter index of artist> '))
    for i in d.albums:
        try:
            print str(i) + ':', d.albums[i]
        except:
            print str(i) + ': not printable'
    d.download_album(raw_input('enter index of album> '))

Enjoy!
« Last Edit: July 01, 2015, 02:36:30 pm by TheWormKill »
Stuff I did: How to think like a superuser, Iridium

He should make that "Haskell"
Quote
<m0rph-is-gay> fuck you thewormkill you python coding mother fucker