Author Topic: [Python] Vertor Checker  (Read 515 times)

0 Members and 1 Guest are viewing this topic.

Offline vezzy

  • Royal Highness
  • ****
  • Posts: 771
  • Cookies: 172
    • View Profile
[Python] Vertor Checker
« on: May 16, 2013, 07:27:47 pm »
I recently finished up this basic web scraping script.

Vertor is a BitTorrent site that is specially coded to do verifications on uploads and scans for malware, DRM and password-protected archives before putting the file into its database. As such, it may come off as useful as an additional torrent security measure by checking up to see if a given torrent file is legitimate by querying its database for a result.

This was my first attempt at web scraping and BeautifulSoup4, so it may be somewhat kludgy. If you don't have BeautifulSoup installed, run: pip install BeautifulSoup4, or download it from the official website and run setup.py.

What the script does is take a search query through sys.argv[1], sends it to the Vertor database, rips the number of results from an <h1> tag, isolates the integer through a regex and proceeds to display all the references to the query in the page source. Vertor's markup is not tidy, to say the least, so I had to stick with this. Given that this was mostly written out of boredom and for practice, I suppose it is a fair trade.

Vertor treats queries with words separated by periods as identical to multiple-word/parameter queries, and I took note of that in the source code as a method to get by sys.argv's restrictions.

Source:

Code: [Select]
#!/usr/bin/env python
# Simple script that queries a search result to Vertor's database and scrapes the important information.
# This can be used as an additional torrent security measure.
# by vezzy of evilzone.org

# NOTE: If you want to input a multiple-parameter argument, e.g. Video Game, concatenate it with dots, as in Video.Game
# Vertor will process the query in the same way, regardless.

import sys
import re
from urllib2 import urlopen
from bs4 import BeautifulSoup

STARTUP_URL = "http://www.vertor.com/index.php?mod=search&search=&cid=0&words="

def usage():
    print "/Vertor Checker/"
    print "Usage: python vertor.py <search query>"
    sys.exit()

try:
    query = sys.argv[1]
except IndexError:
    usage()

def get_result_number():
    print "Sending query to Vertor...\n"

    search = BeautifulSoup(urlopen(STARTUP_URL + query).read())
    search.find("div", "standart")
    grep_h1 = str([h1.string for h1 in search.findAll("h1")])

    #Regular expression to separate integer from string in the <h1> tag
    result_number = str([int(num.group()) for num in re.finditer(r'\d+', str(grep_h1))])

    print "Found" + ' ' + result_number + ' ' + "result(s) for" + ' ' + query + ' ' + "(with double amount of references in source)"
    print "Torrent release appears to be verified and safe."
    print "Query references in first page of source:\n"

def get_result_list():
    search = BeautifulSoup(urlopen(STARTUP_URL + query).read())   

    directory = "/torrents"
    for a in search.findAll("a"):
        if directory in a["href"]:
            parts = a["href"].split("/")
            torrent_file = parts[2], parts[3]
            print torrent_file
       
if __name__ == '__main__':
    try:
        get_result_number()
        get_result_list()
    except KeyboardInterrupt:
        sys.exit("\nProcess aborted.")

Criticism and suggestions on optimization would be much appreciated.
Quote from: Dippy hippy
Just brushing though. I will be semi active mainly came to find a HQ botnet, like THOR or just any p2p botnet