EvilZone

Programming and Scripting => Scripting Languages => : DamonX April 20, 2013, 04:11:25 AM

: Split any URL into "host", "path", and "filename" variables (Python)
: DamonX April 20, 2013, 04:11:25 AM
Hi,


I am working on creating a HTTP download client in Python and need little assistance.


I am getting url from command line argument (./clientprogram www.google.com/images/test.png (http://www.google.com/images/test.png)) and the split that url into host, path, and filename.  I am only downloading and displaying images on screen tho.


Here is my lil code:



:
import string
import socket
import sys
import os
from subprocess import call
from urllib.parse import urlparse


# ******************************************
#
#  (1) Test input arguments to program - correct number provided?
#      Exit if the required URL is not provided.
#  (2) Split URL into "host", "path", and "filename" variables.
#      http://www.google.com/images/srpr/logo3w.png
#      * host=www.google.com
#      * path=/images/
#      * file=test.png


# host=????
# path=????
# filename=????
# port=????


print("Preparing to download object from http://" + host + path + filename)
print()

How to do split url.  Its easy to do it if url is hardcodes, but not sure it we don't know what URL will be provided by user.


Thanks


Damon
: Re: Split any URL into "host", "path", and "filename" variables (Python)
: relax April 20, 2013, 06:39:59 AM
count the /
before first / is domain
between first and last are paths
after last is file

: Re: Split any URL into "host", "path", and "filename" variables (Python)
: RedBullAddicted April 20, 2013, 07:37:26 AM
: (python)
>>> path = "www.google.com/images/test.png"
>>> pathparts = path.split('/')
>>> for part in pathparts:
...     print part
...
www.google.com
images
test.png
>>> host = pathparts[0]
>>> path = pathparts[1]
>>> filename = pathparts[2]
>>> print host
www.google.com
>>> print path
images
>>> print filename
test.png
>>>
: Re: Split any URL into "host", "path", and "filename" variables (Python)
: Kulverstukas April 20, 2013, 08:01:06 AM
You could also see this link for some routines: http://docs.python.org/2/library/os.path.html#module-os.path
: Re: Split any URL into "host", "path", and "filename" variables (Python)
: proxx April 20, 2013, 08:17:46 AM
: (python)
>>> path = "www.google.com/images/test.png"
>>> pathparts = path.split('/')
>>> for part in pathparts:
...     print part
...
www.google.com
images
test.png
>>> host = pathparts[0]
>>> path = pathparts[1]
>>> filename = pathparts[2]
>>> print host
www.google.com
>>> print path
images
>>> print filename
test.png
>>>

I had exactly the same thing in mind.
:
url="www.google.nl/images/test.png"
for i in url.split("/"):
        print i
Output:
:
www.google.nl
images
test.png
: Re: Split any URL into "host", "path", and "filename" variables (Python)
: RedBullAddicted April 20, 2013, 08:30:17 AM
Exactly :) and the print can be done a bit cleaner this way

: (python)
>>> print("Preparing to download object from http://%s/%s/%s" %(host, path, filename))
Preparing to download object from http://www.google.com/images/test.png
: Re: Split any URL into "host", "path", and "filename" variables (Python)
: Deque April 20, 2013, 08:44:27 AM
Use urlparse. It takes care for every case you might not think of right now.
Example:

:
from urlparse import urlparse

result = urlparse('http://evilzone.org/scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/#new')
print "scheme", result.scheme
print "netloc", result.netloc
print "path", result.path
print "params", result.params
print "query", result.query
print "fragment", result.fragment

Output:

deque@decra:~/Dokumente/python$ python url.py
scheme http
netloc evilzone.org
path /scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/
params
query
fragment new

Edit: For Python 3 the name is urllib.parse
: Re: Split any URL into "host", "path", and "filename" variables (Python)
: DamonX April 20, 2013, 07:18:32 PM
wow ... can't believe how many people replied within short period of time.  This is even better than stackoverflow.  :)  I will try your suggestions and will let u know how it goes.

Thanks all

Damon
: Re: Split any URL into "host", "path", and "filename" variables (Python)
: DamonX April 21, 2013, 10:55:59 PM
Thanks, I had to do lil modification but I was able to do it by also using basename() and dirname().