EvilZone
Programming and Scripting => Scripting Languages => : TheWormKill July 01, 2015, 02:35:37 PM
-
Hey everyone,
since I am an old-fashioned idiot person, I am still downloading mp3's. And being even more old-fashioned cool, I also
listen to full albums. The website in the subject is a pretty good archive of different music, but has a few drawbacks:
1. It's in Russian, so most people here can't read it.
2. You need to download every single song by hand, which sucks for me, since it takes 5(!) actions (see first paragraph).
Thus, I wrote a python-script that gets you the music without the need to open a browser.
It allows you to search for an artist and select an album, which gets downloaded to disk afterwards. Simple and concise.
It uses requests and BeautifulSoup4 to get the content and parse the HTML, so make sure you have those installed.
Here's a sample session with some fine Metal:
$ ./music_downloader.py
enter your search query> Mastodon
your query matched:
0: Mastodon
1: Mastodon & ZZ Top
2: ZZ Top / Mastodon
enter index of artist> 0
0: 2011 Black Tongue [Single]
1: 2011 Curl Of The Burl [Single]
2: 2011 Live At The Aragon
3: 2011 Spectrelight [Single]
4: 2011 The Hunter
5: 2011 The Hunter [Deluxe Edition]
6: 2010 Jonah Hex [EP]
7: 2009 Crack The Skye
8: 2006 Blood Mountain
9: 2006 Call Of The Mastodon
10: 2006 The Wolf Is Loose [EP]
11: 2004 Leviathan
12: 2002 Remission
13: 2001 Lifesblood [EP]
14: 2000 Demo
enter index of album> 11
songs to be downloaded:
0: Blood And Thunder
1: I Am Ahab
2: Seabeast
3: Island
4: Iron Tusk
5: Megalodon
6: Naked Burn
7: Aqua Dementia
8: Hearts Alive
9: Joseph Merrick
preparing for download of file #0... done.
downloading file #0 from http://tempfile.ru/download/c833aabb61f035a00aef7f3c86e6b642 (http://tempfile.ru/download/c833aabb61f035a00aef7f3c86e6b642) done.
preparing for download of file #1... done.
downloading file #1 from http://tempfile.ru/download/f959975b1b5536dc6d1018c446e023bd (http://tempfile.ru/download/f959975b1b5536dc6d1018c446e023bd) done.
preparing for download of file #2... done.
downloading file #2 from http://tempfile.ru/download/92d35578636c4b40b8a24440125bd8cd (http://tempfile.ru/download/92d35578636c4b40b8a24440125bd8cd) done.
preparing for download of file #3... done.
downloading file #3 from http://tempfile.ru/download/951305ba17df6a2f2e4b3263c98ec417 (http://tempfile.ru/download/951305ba17df6a2f2e4b3263c98ec417) done.
preparing for download of file #4... done.
downloading file #4 from http://tempfile.ru/download/5bc20711ea4e04dccfdf38f26827ab00 (http://tempfile.ru/download/5bc20711ea4e04dccfdf38f26827ab00) done.
preparing for download of file #5... done.
downloading file #5 from http://tempfile.ru/download/26163a106f12f3569a85d9116564a488 (http://tempfile.ru/download/26163a106f12f3569a85d9116564a488) done.
preparing for download of file #6... done.
downloading file #6 from http://tempfile.ru/download/1e2b1a65cc2c752c5db5e7461d51908f (http://tempfile.ru/download/1e2b1a65cc2c752c5db5e7461d51908f) done.
preparing for download of file #7... done.
downloading file #7 from http://tempfile.ru/download/afbfd4e64f1c3850f2cbc7f87451f1de (http://tempfile.ru/download/afbfd4e64f1c3850f2cbc7f87451f1de) done.
preparing for download of file #8... done.
downloading file #8 from http://tempfile.ru/download/11cc12d2aaa8b418acbe2bb9901dc386 (http://tempfile.ru/download/11cc12d2aaa8b418acbe2bb9901dc386) done.
preparing for download of file #9... done.
downloading file #9 from http://tempfile.ru/download/82ad8c46a0c3ab10dad4750369829d17 (http://tempfile.ru/download/82ad8c46a0c3ab10dad4750369829d17) done.
task completed. exiting.
The output's not perfect and there might be bugs, but it serves it's purpose well enough.
The code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import os
class Artist:
def __init__(self, name, url):
self.name = name
self.url = 'http://musicmp3spb.org' + url
def __str__(self):
return self.name
class Album:
def __init__(self, artist, name, url):
self.artist = artist
self.name = name
self.url = 'http://musicmp3spb.org' + url
def __str__(self):
return self.name
class Downloader:
def __init__(self):
self.artists = {}
self.albums = {}
def find_artists(self, search_string):
txt = requests.get(
'http://musicmp3spb.org/search/?Content=%s&category=3'
% search_string.replace(' ', '+')).content
soup = BeautifulSoup(txt)
links = []
artist_names = []
for link in soup.find_all('a'):
not_top10 = True
for parent in link.parents:
c = parent.get('class')
if c and 'tTop10' in c:
not_top10 = False
if not_top10 and link.get('href').startswith('/artist/'):
artist_names.append(link.get_text())
links.append(link.get('href'))
for index, artist in enumerate(artist_names):
self.artists[index] = Artist(artist, links[index])
def find_album(self, artist_index):
artist_index = int(artist_index)
artist = self.artists[artist_index]
txt = requests.get(artist.url).content
soup = BeautifulSoup(txt)
links = []
album_names = []
for link in soup.find_all('a'):
if link.get('href').startswith('/album/') and link.get_text():
album_names.append(link.get_text())
links.append(link.get('href'))
for index, album in enumerate(album_names):
self.albums[index] = Album(artist.name, album, links[index])
def download_album(self, album_index):
album_index = int(album_index)
album = self.albums[album_index]
txt = requests.get(album.url).content
soup = BeautifulSoup(txt)
links = []
song_names = []
for link in soup.find_all('a'):
not_top10 = True
for parent in link.parents:
c = parent.get('class')
if c and 'tTop10' in c:
not_top10 = False
if not_top10 and link.get(
'href').startswith('/download/') and link.get_text():
song_names.append(link.get_text())
links.append('http://musicmp3spb.org' + link.get('href'))
print 'songs to be downloaded:'
for index, name in enumerate(song_names):
print str(index)+':', name
d = 'downloads/%s/%s' % (album.artist, album.name)
if not os.path.exists(d):
print 'creating directory', d + '...'
os.makedirs(d)
for num, link in enumerate(links):
print 'preparing for download of file #' + str(num) + '...',
req1 = requests.get(link)
soup = BeautifulSoup(req1.text)
for form in soup.find_all('form'):
if form.get('action') != '/member/':
url = 'http://tempfile.ru' + form.get('action')
for child in form.descendants:
if child.name == 'input' and \
child.get('type') == 'hidden':
hidden_input_name = child.get('name')
hidden_input_value = child.get('value')
values = {hidden_input_name: hidden_input_value}
req2 = requests.post(url, data=values)
soup = BeautifulSoup(req2.text)
for a in soup.find_all('a'):
if a.get('href').startswith('http://tempfile.ru/download/'):
link3 = a.get('href')
print 'done.'
if not link3 == '':
print 'downloading file #' + str(num) + ' from', link3,
downloaded_file = requests.get(link3)
file1 = open('downloads/' + album.artist + '/' +
album.name + '/' + str(num) + '-' +
song_names[num] + '.mp3', 'wb')
file1.write(downloaded_file.content)
file1.close()
print 'done.'
else:
print 'link invalid, download impossible:', link3 + '!'
print 'task completed. exiting.'
if __name__ == '__main__':
d = Downloader()
d.find_artists(raw_input('enter your search query> '))
print 'your query matched:'
for i in d.artists:
try:
print str(i) + ':', d.artists[i]
except:
print str(i) + ': not printable'
d.find_album(raw_input('enter index of artist> '))
for i in d.albums:
try:
print str(i) + ':', d.albums[i]
except:
print str(i) + ': not printable'
d.download_album(raw_input('enter index of album> '))
Enjoy!