EvilZone

Programming and Scripting => Scripting Languages => : TheWormKill July 01, 2015, 02:35:37 PM

: [python] A downloading script for http://musicmp3spb.org/
: TheWormKill July 01, 2015, 02:35:37 PM
Hey everyone,

since I am an old-fashioned idiot person, I am still downloading mp3's. And being even more old-fashioned cool, I also
listen to full albums. The website in the subject is a pretty good archive of different music, but has a few drawbacks:
1. It's in Russian, so most people here can't read it.
2. You need to download every single song by hand, which sucks for me, since it takes 5(!) actions (see first paragraph).

Thus, I wrote a python-script that gets you the music without the need to open a browser.
It allows you to search for an artist and select an album, which gets downloaded to disk afterwards. Simple and concise.

It uses requests and BeautifulSoup4 to get the content and parse the HTML, so make sure you have those installed.

Here's a sample session with some fine Metal:

$ ./music_downloader.py
enter your search query> Mastodon
your query matched:
0: Mastodon
1: Mastodon & ZZ Top
2: ZZ Top / Mastodon
enter index of artist> 0
0: 2011 Black Tongue [Single]
1: 2011 Curl Of The Burl [Single]
2: 2011 Live At The Aragon
3: 2011 Spectrelight [Single]
4: 2011 The Hunter
5: 2011 The Hunter [Deluxe Edition]
6: 2010 Jonah Hex [EP]
7: 2009 Crack The Skye
8: 2006 Blood Mountain
9: 2006 Call Of The Mastodon
10: 2006 The Wolf Is Loose [EP]
11: 2004 Leviathan
12: 2002 Remission
13: 2001 Lifesblood [EP]
14: 2000 Demo
enter index of album> 11
songs to be downloaded:
0: Blood And Thunder
1: I Am Ahab
2: Seabeast
3: Island
4: Iron Tusk
5: Megalodon
6: Naked Burn
7: Aqua Dementia
8: Hearts Alive
9: Joseph Merrick
preparing for download of file #0... done.
downloading file #0 from http://tempfile.ru/download/c833aabb61f035a00aef7f3c86e6b642 (http://tempfile.ru/download/c833aabb61f035a00aef7f3c86e6b642) done.
preparing for download of file #1... done.
downloading file #1 from http://tempfile.ru/download/f959975b1b5536dc6d1018c446e023bd (http://tempfile.ru/download/f959975b1b5536dc6d1018c446e023bd) done.
preparing for download of file #2... done.
downloading file #2 from http://tempfile.ru/download/92d35578636c4b40b8a24440125bd8cd (http://tempfile.ru/download/92d35578636c4b40b8a24440125bd8cd) done.
preparing for download of file #3... done.
downloading file #3 from http://tempfile.ru/download/951305ba17df6a2f2e4b3263c98ec417 (http://tempfile.ru/download/951305ba17df6a2f2e4b3263c98ec417) done.
preparing for download of file #4... done.
downloading file #4 from http://tempfile.ru/download/5bc20711ea4e04dccfdf38f26827ab00 (http://tempfile.ru/download/5bc20711ea4e04dccfdf38f26827ab00) done.
preparing for download of file #5... done.
downloading file #5 from http://tempfile.ru/download/26163a106f12f3569a85d9116564a488 (http://tempfile.ru/download/26163a106f12f3569a85d9116564a488) done.
preparing for download of file #6... done.
downloading file #6 from http://tempfile.ru/download/1e2b1a65cc2c752c5db5e7461d51908f (http://tempfile.ru/download/1e2b1a65cc2c752c5db5e7461d51908f) done.
preparing for download of file #7... done.
downloading file #7 from http://tempfile.ru/download/afbfd4e64f1c3850f2cbc7f87451f1de (http://tempfile.ru/download/afbfd4e64f1c3850f2cbc7f87451f1de) done.
preparing for download of file #8... done.
downloading file #8 from http://tempfile.ru/download/11cc12d2aaa8b418acbe2bb9901dc386 (http://tempfile.ru/download/11cc12d2aaa8b418acbe2bb9901dc386) done.
preparing for download of file #9... done.
downloading file #9 from http://tempfile.ru/download/82ad8c46a0c3ab10dad4750369829d17 (http://tempfile.ru/download/82ad8c46a0c3ab10dad4750369829d17) done.
task completed. exiting.

The output's not perfect and there might be bugs, but it serves it's purpose well enough.

The code:
: (Python)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

import os


class Artist:
    def __init__(self, name, url):
        self.name = name
        self.url = 'http://musicmp3spb.org' + url

    def __str__(self):
        return self.name


class Album:
    def __init__(self, artist, name, url):
        self.artist = artist
        self.name = name
        self.url = 'http://musicmp3spb.org' + url

    def __str__(self):
        return self.name


class Downloader:
    def __init__(self):
        self.artists = {}
        self.albums = {}

    def find_artists(self, search_string):
        txt = requests.get(
            'http://musicmp3spb.org/search/?Content=%s&category=3'
            % search_string.replace(' ', '+')).content
        soup = BeautifulSoup(txt)
        links = []
        artist_names = []
        for link in soup.find_all('a'):
            not_top10 = True
            for parent in link.parents:
                c = parent.get('class')
                if c and 'tTop10' in c:
                    not_top10 = False
            if not_top10 and link.get('href').startswith('/artist/'):
                artist_names.append(link.get_text())
                links.append(link.get('href'))
        for index, artist in enumerate(artist_names):
            self.artists[index] = Artist(artist, links[index])

    def find_album(self, artist_index):
        artist_index = int(artist_index)
        artist = self.artists[artist_index]
        txt = requests.get(artist.url).content
        soup = BeautifulSoup(txt)
        links = []
        album_names = []
        for link in soup.find_all('a'):
            if link.get('href').startswith('/album/') and link.get_text():
                album_names.append(link.get_text())
                links.append(link.get('href'))
        for index, album in enumerate(album_names):
            self.albums[index] = Album(artist.name, album, links[index])

    def download_album(self, album_index):
        album_index = int(album_index)
        album = self.albums[album_index]
        txt = requests.get(album.url).content
        soup = BeautifulSoup(txt)
        links = []
        song_names = []
        for link in soup.find_all('a'):
            not_top10 = True
            for parent in link.parents:
                c = parent.get('class')
                if c and 'tTop10' in c:
                    not_top10 = False
            if not_top10 and link.get(
                    'href').startswith('/download/') and link.get_text():
                song_names.append(link.get_text())
                links.append('http://musicmp3spb.org' + link.get('href'))
        print 'songs to be downloaded:'
        for index, name in enumerate(song_names):
            print str(index)+':', name
        d = 'downloads/%s/%s' % (album.artist, album.name)
        if not os.path.exists(d):
            print 'creating directory', d + '...'
            os.makedirs(d)
        for num, link in enumerate(links):
            print 'preparing for download of file #' + str(num) + '...',
            req1 = requests.get(link)
            soup = BeautifulSoup(req1.text)
            for form in soup.find_all('form'):
                if form.get('action') != '/member/':
                    url = 'http://tempfile.ru' + form.get('action')
                    for child in form.descendants:
                        if child.name == 'input' and \
                                child.get('type') == 'hidden':
                            hidden_input_name = child.get('name')
                            hidden_input_value = child.get('value')
            values = {hidden_input_name: hidden_input_value}
            req2 = requests.post(url, data=values)
            soup = BeautifulSoup(req2.text)
            for a in soup.find_all('a'):
                if a.get('href').startswith('http://tempfile.ru/download/'):
                    link3 = a.get('href')
            print 'done.'
            if not link3 == '':
                print 'downloading file #' + str(num) + ' from', link3,
                downloaded_file = requests.get(link3)
                file1 = open('downloads/' + album.artist + '/' +
                             album.name + '/' + str(num) + '-' +
                             song_names[num] + '.mp3', 'wb')
                file1.write(downloaded_file.content)
                file1.close()
                print 'done.'
            else:
                print 'link invalid, download impossible:', link3 + '!'
        print 'task completed. exiting.'

if __name__ == '__main__':
    d = Downloader()
    d.find_artists(raw_input('enter your search query> '))
    print 'your query matched:'
    for i in d.artists:
        try:
            print str(i) + ':', d.artists[i]
        except:
            print str(i) + ': not printable'
    d.find_album(raw_input('enter index of artist> '))
    for i in d.albums:
        try:
            print str(i) + ':', d.albums[i]
        except:
            print str(i) + ': not printable'
    d.download_album(raw_input('enter index of album> '))

Enjoy!