Coding The Pythonic Way
Hello, EvilZone.
In this thread I will try to show you some pythonic ways to make your python code shorter, cleaner, and maybe faster and more stable.And also some techniques that I will find intersting to share over time. Some of them may seem very simple, other may seem bit more advanced, depending on your personal python knowledge. All the examples are tested on Python 2.7.3 and they work fine.
Requirement : Basic Python Knowledge.
Requirement not covered??
You are absolutely new on Python World??
Before you start take a look at this good Python Basics Tutorial --> http://www.tutorialspoint.com/python/index.htm
Credits @ Psycho_Coder for suggesting me the lxml module back in the days.LET'S START!
OPTIMIZATIONS
Optimize Number 0Swapping variable's values.On many other languages you should do something like this:
temp = a
a = b
b = temp
On
PYTHON the right way to do this is:
b, a = a, b
Optimize Number 1Dont ignore built-in! Summing a numeric int list example.The usual way to get a sum of a numeric iterable on many languages would be:
ls = [1,2,3,4]
ls_sum = 0
for item in ls:
ls_sum += item
On
PYTHON the right way to do this is:
ls = [1,2,3,4]
ls_sum = sum(ls)
So running the
timings.py file :
from time import time
ls = range(10000)
Stime = time()
## 1
the_sum = 0
for item in ls:
the_sum += item
print "Time with FOR loop : %f " %(time() - Stime)
## 2
Stime = time()
the_sum = sum(ls)
print "Time with SUM function : %f " %(time() - Stime)
Will give similar results to that...
Optimize Number 2Conditional check up for 0 or None or False.Some people write down simple if conditions harder than needed to:
boolFlag = 0
if boolFlag != 0 :
print('Flag isn't zero')
else:
print('Flag is zero')
On
PYTHON and generally on programming is better to avoid comparisons when you don't even need them:
boolFlag = 0
if boolFlag:
print("Flag isn't zero")
else:
print("Flag is zero")
**NOTE 0 --> Same would be done if we had implemented boolFlag as None or False.
**NOTE 1 --> This may seem a coding style or prefference of mine, but its not! It is proved that useless comparisons inside loops can increase timings.
So running my loved
timings.py file:
from time import time
## Comparison
Stime = time()
the_sum = 0
for item in xrange(10000000):
if the_sum != 0:
the_sum+=1
else:
the_sum+=2
print "With comparison : %f" % (time()-Stime)
## Not comparison
Stime = time()
the_sum = 0
for item in xrange(10000000):
if the_sum:
the_sum+=1
else:
the_sum+=2
print "Without comparison : %f" % (time()-Stime)
We get results similar to that...
Optimize Number 3String Formatting.On
PYTHON you are able to do that:
string = "Mary Jane"
print("My girlfriend's name is " + string)
Although there is a better way for string formatting (when its only about printing out) avoiding costy concentration:
string = 'Mary Jane'
print("My girlfriend's name is %s"%string)
**NOTE --> For many variables and when its not only about strings, string method
format() is the best way to go(also for more special formating features).
Optimize Number 4String Concentration.When you need to create a string from substrings the most obvious/straigt-forward way is this:
total_string = ''
strings = ['I', 'love', 'you', 'Eve']
for string in strings:
total_string += string + ' '
But in
PYTHON the above proves to be much costly compared to a single string method:
strings = ['I', 'love', 'you', 'Eve']
total_string = ' '.join(strings)
**NOTE --> You can use any character you like for string separator for example this ','.join(strings) would end like 'I,love,you,Eve'.
Optimize Number 5Avoid len() use for empty iterable conditional check up.Its not much usual but we all have seen (and i personally was doing it too) this :
ls = []
if len(ls) == 0:
print("Empty list brah!")
On
PYTHON when you put empty list on if conditions its proving to be False already:
ls = []
if not ls :
print("Empty list brah!")
Optimize Number 6Import Globally not Locally.It proves to be more costy to import locally:
def func():
from time import sleep
for i in range(10):
sleep(1)
Its better/cleaner to import everything on the top:
## Importing
from time import sleep
def func():
for i in range(10):
sleep(1)
LOOK OPTIMIZE 12Optimize Number 7Use map() function when possible.We all know that looping all the time is MUCH costly, avoid looping especially when its about simple tasks:
ls = [1, 2, 3, 4]
string_ls = []
for i in ls:
string_ls.append(str(i))
On
PYTHON using built-in function
map() instead would be better:
ls = [1,2,3,4]
string_ls = map(lambda item : str(item), ls)
**NOTE --> So map(function, iterable) goes like this map takes every single item of the "iterable" and executes "function" given that item as argument, at last the "function" should return a new item and finally whole map() returns a new iterable same type with "iterable".For more complex tasks its better to avoid lambda and normally define a new function to apply on map.
LOOK OPTIMIZE 12LOOK OPTIMIZE 12Optimize Number 8Use filter() function when possible.So loops can be even more costly when they contain conditional statements:
ls = [-1, -2, 3, 4]
result_ls = []
for i in ls:
if i > 0:
result_ls.append(i)
We can avoid this using built-in
filter() function:
ls = [-1, -2, 3, 4]
result_ls = filter(lambda item : True if item > 0 else False, ls)
**NOTE --> So filter(function, iterable) goes like that, takes every single item from the give "iterable" pass it on the given "function" and executes the function.If the function returns False the item is removed from the iterable else if True is returned it keeps the item.At last it returns a new iterable same type with the one given (like the map does).
LOOK OPTIMIZE 12Optimize Number 9.Implement functions the normal way when is about permanent functions.Everybody i guess likes to use lambda when its about simple tasks:
func = lambda item: item ** 2
print(func.__name__)
But lambda functions has there disadvantages its better to use normal definitions about permanent function that will be used a lot in your script, even when its that simple:
def func(item):
return item ** 2
print(func.__name__)
Sample explanation image.
Optimize Number 10Avoid massive usage of raw_input, every millisecond counts.The normal way to get an input is:
inp = raw_input("Give some input : ")
Although sys.stdin.readline() proves to be faster than the normal way, you just have to strip '\n' then and manually set a message:
import sys
print "Give some input : ",
inp = sys.stdin.readline().strip('\n')
When its about massive inputing by the user that way is preffered and in big programs can make the difference for sure
**NOTE --> Although
print built-in function proves to be as fast as
sys.stdout.write() so just stick with print.
Optimize Number 11Big range iterations using xrange, is much faster.The
range() can be a nightmare on Python 2 when is about big iterations:
for x in range(100000000):
print x
Using
xrange() when its about this big iterations proves way better:
for x in xrange(100000000):
print x
I counted down with
ipython, an iteration over
100000000 and thats what was the result...
Furthermore it seems
range() to do better as the distance of the range gets lower but
xrange also proves stability on its timings...
**NOTE 0 --> Although
xrange proves to be a lot better than
range() it has its downfalls which leads us to
range() when needed to, for example
xrange() can't be used when its about real numbers {e.g. range(1,100,0.1)}Trying to do use
xrange() with float will lead on this...
**NOTE 1 --> Speaking about
xrange() downfalls I should mention that
xrange() returns an generator object so you have to convert it on list.. so when its about for small list of numbers to avoid converting, its better to stick with
range()Optimize Number 12AVOID lambda functions combined with map, filter etc.I personally used them inside map, filter
Optimize 7 and
Optimize 8 wanted to give you better understanding through that way about how both functions work
BUT it is proved that map, filter combined with lambda get even a little slower than
for loops!
So how would we customize
Optimize 7 to be perfect?
Instead of doing this:
ls = [1,2,3,4]
string_ls = map(lambda item : str(item), ls)
We would do this:
ls = [1,2,3,4]
string_ls = map( str , ls )
**NOTE --> Best timings are proved using
built-in functions with map, filter etc. but even normally
defined function would work better than
lambda just avoid using lambda when its about implementing functions which will be used a lot, if it seems like you can't avoid them go straight to the
for loop.
Optimize Number 13Function calls are costy. Don't overuse them with no reason.Well thats simple avoid function calls as much as possible, when its about massive function calls in a loop for example:
def func(SUM, item):
sum += item
return sum
SUM = 0
ls = [1,2,3,4]
for item in ls:
SUM = func(SUM, item)
Please avoid doing something like this on big programs where you do more advanced things on a function this could increase up timings..
Prefer to adapt your function to feet your needs, and make it happen in one single call:
def func(ls):
sum = 0
for item in ls:
sum+=item
return sum
ls = [1,2,3,4]
sum_ls = func(ls)
This is much more decent. Avoiding massive function calls when you are able can cause you much more stability and fairly better timings.
TECHNIQUES
Technique Number 0.Function to find and return current free RAM space as int(Linux Only).This technique require some basic Bash Scripting knowledge.
## Importing
from os import popen
## Function to find current available RAM
## and return it as an int.
def free_memory_finder():
## Find total memory
total = popen("free -t | grep Mem: | awk '{print $2}'")
## Find cached memory
cached = popen("free -t | grep '-' | awk '{print $3}'")
## Make them integers
total_memory = int(total.read())
cached_memory = int(cached.read())
## Calculate the free memory
free_memory = total_memory - cached_memory
return free_memory
**NOTE --> The int number that is returned is on KB.
Explantion image, executing the commands one by one on terminal.
Technique Number 1.Function to prepare geometry for a Tkinter.Tk window to be on the center of the screen.This function will bring the Tkinter to the center of the screen when the
mainloop() will be casted.
## Function to set the Tk window on the center
def center_widget(root, width, height):
"""this function is reffered only on tkinter widgets,
it centerize to the screen the main widget"""
## Setting the coordinates to add on geometry set up
x = (root.winfo_screenwidth() / 2) - (width / 2)
y = (root.winfo_screenheight() / 2) - (height/ 2)
## Setting the correct geometry
root.geometry('{0}x{1}+{2}+{3}'.format(width, height, int(x), int(y)))
Then if you run something like...
from Tkinter import Tk
root = Tk()
center_windget(root, 480, 320)
root.mainloop()
You will probably get something like...
Technique Number 2.Function that returns pretty xml output.This function takes an implemented
xml.etree.ElementTree.Element and returns a nice output using
xml.dom.minidom.parseString class and a method of it.
## Importing
from xml.dom.minidom import parseString
from xml.etree.ElementTree import tostring
def unglyxml_to_prettyxml(elem):
"""Function to make the xml output good looking,
using xml.etree via xml.dom method"""
## Taking the whole elem to a string
current_elem_string = tostring(elem, 'utf-8')
## Parse it to the minidom
parsed_on_minidom = parseString(current_elem_string)
## Return it prettier ( on the usual xml view )
return parsed_on_minidom.toprettyxml(indent="\t")
It really help outs those who want to use xml.etree way for xml handling...
I also made an example so you can really understand why its helpful.
Lets create an xml element using
xml.etree module:
## Importing
from xml.etree import ElementTree
from xml.etree.ElementTree import Element, SubElement
## Setting the root element
root = Element("students")
## Initialize and create subelements
## for first subelement of the root
subel0 = SubElement(root, "Class_A")
SubElement(subel0, 'class_A_student', name = 'George Clinton')
SubElement(subel0, 'class_A_student', name = 'Peter Suarez')
## Same for the second SubEle
subel1 = SubElement(root, "Class_B")
SubElement(subel1, 'class_A_student', name = 'Hillary Clinton')
SubElement(subel1, 'class_A_student', name = 'Luis Suarez')
Thats a simple element with some subelements on xml.etree .
Now lets try print it out and see what we get:
## Converting the element to string object
xml_string = ElementTree.tostring(root)
## Print it out
print(xml_string)
Done with converting and printing.
And this is what we get...
You can see, how bad looking it is...just look at the bar its a huge single string.
Now if we import and use the function I have mentioned on that Technique Topic:
## You will adapt the import statement, to fit you needs
from Pretty_XML_output import unglyxml_to_prettyxml
print(unglyxml_to_prettyxml(root))
Then this is what we get now:
**NOTE -->It is recommended though to use
lxml external module than
xml both for xml and html parsing. Even though the
lxml is external and not in standard library it proves to be both faster and more flexible. Plus the
lxml provides a
pretty_print boolean argument which if is set to
True it makes available the functionallity I've shown here with the
unglyxml_to_prettyxml function for more info visit
http://lxml.de/ . I keep this technique for anyone that still wants to use
xml module for some weird reasons, when I wrote it back in time I didn't know about
lxml goodies too.
Technique Number 3.Usefull web-crawling function.This is a function which use lxml to crawl specified tags' text from a specified website and return a list of all the final contents it found or False for empty list.
## Importing
from urllib2 import urlopen, Request
from lxml.html import parse
THE_HEADERS = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64)',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
def tagListCreate(website, tag):
"""Function to crawl text of given tags of given website.
:return A list with all the crawled content"""
## Implenenting
cont_list = []
request = Request(website, headers=THE_HEADERS)
site = urlopen(request)
Hcode = parse(site)
## Getting all the tag contents
append = cont_list.append
for item in Hcode.iter(tag):
if item.text_content():
append(item.text_content())
else:
continue
## Check and return
if list_content:
return list_content
else:
return False
**NOTE --> The headers used here shouldn't necessarily be the same, you can customize the constant to fit your needs.
Now running something like this:
from pprint import pprint
pprint(tagListCreate('http://www.twitter.com', 'h2'))
You will get this sample output:
Thats all I have
for now.
Also make sure you take a look at the
Docstrings Conventions PEP.
Its worth the time if you are a beginner.
Okay guys I hope you liked that optimizations/techniques thread of mine. I also hope that my crappy english didn't bother you too much , please make a post if you are a PYTHONISTAZ and you have any suggestions, corrections, dislikes that would be very helpful for me.
With Respect For EvilZone, L0aD1nG.
This Thread Will Be Updated Over Time