Author Topic: Scraper that reads and writes to your web page.  (Read 423 times)

0 Members and 1 Guest are viewing this topic.

Offline shimomura

  • Peasant
  • *
  • Posts: 57
  • Cookies: 0
    • View Profile
    • Shanaynay
Scraper that reads and writes to your web page.
« on: May 30, 2014, 09:21:12 pm »
As the title suggests, I just finished making a basic web scraper that searches for sales on a certain web site and then takes that info and writes it into your html file for automated web page editing.

I'm still new at this so if you guys have any any suggestions on what I could do better next time please do let me know.

Code: (python) [Select]
import urllib
import re
import fileinput

filez = "index.html"

#Finds Product Links @ winedeals.com

url = "http://www.winedeals.com/"
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<h2 class="product-name"><a href="http://www.winedeals.com/(.+?)"'
pattern = re.compile(regex)
new_url_text = re.findall(pattern, htmltext)

x = 0

while x in range(len(new_url_text)):
   
    #Finds Product Title 
   
    new_url = url + str(new_url_text[x])
    title_htmlfile = urllib.urlopen(new_url)
    title_htmltext = title_htmlfile.read()
    title_regex = '<strong itemprop="title">(.+?)</strong>'
    title_pattern = re.compile(title_regex)
    product_title = re.findall(title_pattern, title_htmltext)

    #Finds Product Price
   
    price_regex = '<span class="price">(.+?)</span>'
    price_pattern = re.compile(price_regex)
    price_title = re.findall(price_pattern, title_htmltext)

    #Finds Product Savings
   
    savings_regex = '<span class="save">(.+?)</span>'
    savings_pattern = re.compile(savings_regex)
    savings_title = re.findall(savings_pattern, title_htmltext)

    #Stores Product Image Source Location
   
    image_regex = '<img itemprop="image" src="(.+?)"'
    image_pattern = re.compile(image_regex)
    image_title = re.findall(image_pattern, title_htmltext)

    print "Product Name:",product_title[0]
    print "Price:",price_title[0]
    print savings_title[0]
    print"-------------------------"
    print

    x += 1

#Re-Writes index.html with Wine Deals product info. (Only the product name as of now.)

#for line in fileinput.input(filez, inplace = 1):
      #print line.replace("wine_deals_title",product_title[0]),
« Last Edit: May 30, 2014, 09:29:25 pm by Kulverstukas »
Who gives a fuck what color the dress is...

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: Scraper that reads and writes to your web page.
« Reply #1 on: May 30, 2014, 09:29:01 pm »
That's good, just that your variables are scattered and that X one is completely meaningless. Constants should be all uppercase too.

Other than that the code seems ok.