Author Topic: Need to Extract Data from a Website - Noobie here  (Read 318 times)

0 Members and 1 Guest are viewing this topic.

Offline JM35

  • NULL
  • Posts: 1
  • Cookies: 0
    • View Profile
Need to Extract Data from a Website - Noobie here
« on: January 29, 2015, 10:49:55 pm »
Hello,

So basically what I'm looking for I wouldnt consider hacking at all, but I figure the knowledge on here could probably help me out with a solution.

So I run an online store, car parts to be specific. We have a supplier that has a massive online catalog and we basically would like to get all the product data they have for a specific make of vehicles.

The way the catalog works is you pick your make of vehicle, then pick the year, then select the model. From there you select a category of parts, which takes you to a subcategory where you select the specific part that is being searched for. Once you select that it takes you to a list of the parts they have: sku, brand, price, picture, what all vehicles it fits, etc..

This is the information we are trying to get from their catalog, basically a list of all the parts we have access to listing on our website. We just need a way to extract it so that we dont have to manually upload the products to our site one by one.

Any ideas how this would be done?

Thanks

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: Need to Extract Data from a Website - Noobie here
« Reply #1 on: January 30, 2015, 09:09:28 am »
First you need to figure out how do the requests get sent - GET or POST. Either way you need to also get the same parameters from their dropdowns and then basically it's just a matter of crafting the request... oh and well, parsing the HTML - I recommend using a dedicated lib for that, if with python use beautifulsoup - only use regex for very simple things.

I imagine you want to do it like this - you select some car model from your site and you get the data from another site with the parameters you selected?