Author Topic: Split any URL into "host", "path", and "filename" variables (Python)  (Read 3267 times)

0 Members and 1 Guest are viewing this topic.

Offline DamonX

  • Serf
  • *
  • Posts: 35
  • Cookies: 2
    • View Profile
Hi,


I am working on creating a HTTP download client in Python and need little assistance.


I am getting url from command line argument (./clientprogram www.google.com/images/test.png) and the split that url into host, path, and filename.  I am only downloading and displaying images on screen tho.


Here is my lil code:



Code: [Select]
import string
import socket
import sys
import os
from subprocess import call
from urllib.parse import urlparse


# ******************************************
#
#  (1) Test input arguments to program - correct number provided?
#      Exit if the required URL is not provided.
#  (2) Split URL into "host", "path", and "filename" variables.
#      http://www.google.com/images/srpr/logo3w.png
#      * host=www.google.com
#      * path=/images/
#      * file=test.png


# host=????
# path=????
# filename=????
# port=????


print("Preparing to download object from http://" + host + path + filename)
print()

How to do split url.  Its easy to do it if url is hardcodes, but not sure it we don't know what URL will be provided by user.


Thanks


Damon
« Last Edit: April 20, 2013, 04:22:01 am by DamonX »

Offline relax

  • Sir
  • ***
  • Posts: 562
  • Cookies: 114
  • The one and only
    • View Profile
Re: Split any URL into "host", "path", and "filename" variables (Python)
« Reply #1 on: April 20, 2013, 06:39:59 am »
count the /
before first / is domain
between first and last are paths
after last is file


Offline RedBullAddicted

  • Moderator
  • Sir
  • *
  • Posts: 519
  • Cookies: 189
    • View Profile
Re: Split any URL into "host", "path", and "filename" variables (Python)
« Reply #2 on: April 20, 2013, 07:37:26 am »
Code: (python) [Select]
>>> path = "www.google.com/images/test.png"
>>> pathparts = path.split('/')
>>> for part in pathparts:
...     print part
...
www.google.com
images
test.png
>>> host = pathparts[0]
>>> path = pathparts[1]
>>> filename = pathparts[2]
>>> print host
www.google.com
>>> print path
images
>>> print filename
test.png
>>>
Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before. - Edgar Allan Poe

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: Split any URL into "host", "path", and "filename" variables (Python)
« Reply #3 on: April 20, 2013, 08:01:06 am »
You could also see this link for some routines: http://docs.python.org/2/library/os.path.html#module-os.path

Offline proxx

  • Avatarception
  • Global Moderator
  • Titan
  • *
  • Posts: 2803
  • Cookies: 256
  • ФФФ
    • View Profile
Re: Split any URL into "host", "path", and "filename" variables (Python)
« Reply #4 on: April 20, 2013, 08:17:46 am »
Code: (python) [Select]
>>> path = "www.google.com/images/test.png"
>>> pathparts = path.split('/')
>>> for part in pathparts:
...     print part
...
www.google.com
images
test.png
>>> host = pathparts[0]
>>> path = pathparts[1]
>>> filename = pathparts[2]
>>> print host
www.google.com
>>> print path
images
>>> print filename
test.png
>>>

I had exactly the same thing in mind.
Code: [Select]
url="www.google.nl/images/test.png"
for i in url.split("/"):
        print i
Output:
Code: [Select]
www.google.nl
images
test.png
Wtf where you thinking with that signature? - Phage.
This was another little experiment *evillaughter - Proxx.
Evilception... - Phage

Offline RedBullAddicted

  • Moderator
  • Sir
  • *
  • Posts: 519
  • Cookies: 189
    • View Profile
Re: Split any URL into "host", "path", and "filename" variables (Python)
« Reply #5 on: April 20, 2013, 08:30:17 am »
Exactly :) and the print can be done a bit cleaner this way

Code: (python) [Select]
>>> print("Preparing to download object from http://%s/%s/%s" %(host, path, filename))
Preparing to download object from http://www.google.com/images/test.png
Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before. - Edgar Allan Poe

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: Split any URL into "host", "path", and "filename" variables (Python)
« Reply #6 on: April 20, 2013, 08:44:27 am »
Use urlparse. It takes care for every case you might not think of right now.
Example:

Code: [Select]
from urlparse import urlparse

result = urlparse('http://evilzone.org/scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/#new')
print "scheme", result.scheme
print "netloc", result.netloc
print "path", result.path
print "params", result.params
print "query", result.query
print "fragment", result.fragment

Output:

Quote
deque@decra:~/Dokumente/python$ python url.py
scheme http
netloc evilzone.org
path /scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/
params
query
fragment new

Edit: For Python 3 the name is urllib.parse
« Last Edit: April 20, 2013, 08:54:48 am by Deque »

Offline DamonX

  • Serf
  • *
  • Posts: 35
  • Cookies: 2
    • View Profile
Re: Split any URL into "host", "path", and "filename" variables (Python)
« Reply #7 on: April 20, 2013, 07:18:32 pm »
wow ... can't believe how many people replied within short period of time.  This is even better than stackoverflow.  :)  I will try your suggestions and will let u know how it goes.

Thanks all

Damon

Offline DamonX

  • Serf
  • *
  • Posts: 35
  • Cookies: 2
    • View Profile
Re: Split any URL into "host", "path", and "filename" variables (Python)
« Reply #8 on: April 21, 2013, 10:55:59 pm »
Thanks, I had to do lil modification but I was able to do it by also using basename() and dirname().

« Last Edit: April 22, 2013, 02:35:37 am by DamonX »