Author Topic: Python Optimizations and Techniques.  (Read 458 times)

0 Members and 1 Guest are viewing this topic.

Offline L0aD1nG

  • Peasant
  • *
  • Posts: 83
  • Cookies: 6
  • NeverFear1isHere
    • View Profile
Python Optimizations and Techniques.
« on: November 01, 2014, 12:26:33 am »
Coding The Pythonic Way
Hello, EvilZone.

In this thread I will try to show you some pythonic ways to make your python code shorter, cleaner, and maybe faster and more stable.And also some techniques that I will find intersting to share over time. Some of them may seem very simple, other may seem bit more advanced, depending on your personal python knowledge. All the examples are tested on Python 2.7.3 and they work fine.

Requirement :
Basic Python Knowledge.

Requirement not covered??
You are absolutely new on Python World??
Before you start take a look at this good Python Basics Tutorial --> http://www.tutorialspoint.com/python/index.htm

Credits @
Psycho_Coder for suggesting me the lxml module back in the days.

LET'S START!
OPTIMIZATIONS


Optimize Number 0
Swapping variable's values.

On many other languages you should do something like this:
Code: [Select]
temp = a
a = b
b = temp

On PYTHON the right way to do this is:
Code: [Select]
b, a = a, b



Optimize Number 1
Dont ignore built-in! Summing a numeric int list example.

The usual way to get a sum of a numeric iterable on many languages would be:
Code: [Select]
ls = [1,2,3,4]
ls_sum = 0
for item in ls:
    ls_sum += item

On PYTHON the right way to do this is:
Code: [Select]
ls = [1,2,3,4]
ls_sum = sum(ls)

So running the timings.py file :
Code: [Select]
from time import time

ls = range(10000)
Stime = time()

## 1
the_sum = 0
for item in ls:
    the_sum += item
print "Time with FOR loop     : %f " %(time() - Stime)

## 2
Stime = time()
the_sum = sum(ls)
print "Time with SUM function : %f " %(time() - Stime)

Will give similar results to that...



Optimize Number 2
Conditional check up for 0 or None or False.

Some people write down simple if conditions harder than needed to:
Code: [Select]
boolFlag = 0
if boolFlag != 0 :
    print('Flag isn't zero')
else:
    print('Flag is zero')

On PYTHON and generally on programming is better to avoid comparisons when you don't even need them:
Code: [Select]
boolFlag = 0
if boolFlag:
    print("Flag isn't zero")
else:
    print("Flag is zero")
**NOTE 0 --> Same would be done if we had implemented boolFlag as None or False.
**NOTE 1 --> This may seem a coding style or prefference of mine, but its not! It is proved that useless comparisons inside loops can increase timings.

So running my loved timings.py file:
Code: [Select]
from time import time

## Comparison
Stime = time()
the_sum = 0
for item in xrange(10000000):
    if the_sum != 0:
        the_sum+=1
    else:
        the_sum+=2
print "With comparison    : %f" % (time()-Stime)

## Not comparison
Stime = time()
the_sum = 0
for item in xrange(10000000):
    if the_sum:
        the_sum+=1
    else:
        the_sum+=2
print "Without comparison : %f" % (time()-Stime)

We get results similar to that...



Optimize Number 3
String Formatting.

On PYTHON you are able to do that:
Code: [Select]
string = "Mary Jane"
print("My girlfriend's name is " + string)

Although there is a better way for string formatting (when its only about printing out) avoiding costy concentration:
Code: [Select]
string = 'Mary Jane'
print("My girlfriend's name is %s"%string)
**NOTE --> For many variables and when its not only about strings, string method format() is the best way to go(also for more special formating features).



Optimize Number 4
String Concentration.

When you need to create a string from substrings the most obvious/straigt-forward way is this:
Code: [Select]
total_string = ''
strings = ['I', 'love', 'you', 'Eve']
for string in strings:
    total_string += string + ' '

But in PYTHON the above proves to be much costly compared to a single string method:
Code: [Select]
strings = ['I', 'love', 'you', 'Eve']
total_string = ' '.join(strings)
**NOTE --> You can use any character you like for string separator for example this ','.join(strings) would end like 'I,love,you,Eve'.



Optimize Number 5
Avoid len() use for empty iterable conditional check up.

Its not much usual but we all have seen (and i personally was doing it too) this :
Code: [Select]
ls = []
if len(ls) == 0:
    print("Empty list brah!")

On PYTHON when you put empty list on if conditions its proving to be False already:
Code: [Select]
ls = []
if not ls :
    print("Empty list brah!")



Optimize Number 6
Import Globally not Locally.

It proves to be more costy to import locally:
Code: [Select]
def func():
    from time import sleep
    for i in range(10):
        sleep(1)

Its better/cleaner to import everything on the top:
Code: [Select]
## Importing
from time import sleep

def func():
    for i in range(10):
      sleep(1)



LOOK OPTIMIZE 12
Optimize Number 7
Use map() function when possible.

We all know that looping all the time is MUCH costly, avoid looping especially when its about simple tasks:
Code: [Select]
ls = [1, 2, 3, 4]
string_ls = []
for i in ls:
    string_ls.append(str(i))

On PYTHON using built-in function map() instead would be better:
Code: [Select]
ls = [1,2,3,4]
string_ls = map(lambda item : str(item), ls)
**NOTE --> So map(function, iterable) goes like this map takes every single item of the "iterable" and executes "function" given that item as argument, at last the "function" should return a new item and finally whole map() returns a new iterable same type with "iterable".For more complex tasks its better to avoid lambda and normally define a new function to apply on map.
LOOK OPTIMIZE 12


LOOK OPTIMIZE 12
Optimize Number 8
Use filter() function when possible.

So loops can be even more costly when they contain conditional statements:
Code: [Select]
ls = [-1, -2, 3, 4]
result_ls = []
for i in ls:
    if i > 0:
        result_ls.append(i)

We can avoid this using built-in filter() function:
Code: [Select]
ls = [-1, -2, 3, 4]
result_ls = filter(lambda item : True if item > 0 else False, ls)
**NOTE --> So filter(function, iterable) goes like that, takes every single item from the give "iterable" pass it on the given "function" and executes the function.If the function returns False the item is removed from the iterable else if True is returned it keeps the item.At last it returns a new iterable same type with the one given (like the map does).
LOOK OPTIMIZE 12



Optimize Number 9.
Implement functions the normal way when is about permanent functions.

Everybody i guess likes to use lambda when its about simple tasks:
Code: [Select]
func = lambda item: item ** 2
print(func.__name__)

But lambda functions has there disadvantages its better to use normal definitions about permanent function that will be used a lot in your script, even when its that simple:
Code: [Select]
def func(item):
    return item ** 2
print(func.__name__)

Sample explanation image.




Optimize Number 10
Avoid massive usage of raw_input, every millisecond counts.

The normal way to get an input is:
Code: [Select]
inp = raw_input("Give some input : ")

Although sys.stdin.readline() proves to be faster than the normal way, you just have to strip '\n' then and manually set a message:

Code: [Select]
import sys
print "Give some input : ",
inp = sys.stdin.readline().strip('\n')
When its about massive inputing by the user that way is preffered and in big programs can make the difference for sure
**NOTE --> Although print built-in function proves to be as fast as sys.stdout.write() so just stick with print.



Optimize Number 11
Big range iterations using xrange, is much faster.

The range() can be a nightmare on Python 2 when is about big iterations:
Code: [Select]
for x in range(100000000):
    print x

Using xrange() when its about this big iterations proves way better:
Code: [Select]
for x in xrange(100000000):
    print x

I counted down with ipython, an iteration over 100000000 and thats what was the result...



Furthermore it seems range() to do better as the distance of the range gets lower but xrange also proves stability on its timings...

**NOTE 0 --> Although xrange proves to be a lot better than range() it has its downfalls which leads us to range() when needed to, for example xrange() can't be used when its about real numbers {e.g. range(1,100,0.1)}Trying to do use xrange() with float will lead on this...
**NOTE 1 --> Speaking about xrange() downfalls I should mention that xrange() returns an generator object so you have to convert it on list.. so when its about for small list of numbers to avoid converting, its better to stick with range()



Optimize Number 12
AVOID lambda functions combined with map, filter etc.

I personally used them inside map, filter Optimize 7 and Optimize 8 wanted to give you better understanding through that way about how both functions work BUT it is proved that map, filter combined with lambda get even a little slower than for loops!

So how would we customize Optimize 7 to be perfect?

Instead of doing this:
Code: [Select]
ls = [1,2,3,4]
string_ls = map(lambda item : str(item), ls)

We would do this:
Code: [Select]
ls = [1,2,3,4]
string_ls = map( str , ls )
**NOTE --> Best timings are proved using built-in functions with map, filter etc. but even normally defined function would work better than lambda just avoid using lambda when its about implementing functions which will be used a lot, if it seems like you can't avoid them go straight to the for loop.



Optimize Number 13
Function calls are costy. Don't overuse them with no reason.

Well thats simple avoid function calls as much as possible, when its about massive function calls in a loop for example:
Code: [Select]
def func(SUM, item):
    sum += item
    return sum

SUM = 0
ls = [1,2,3,4]

for item in ls:
    SUM = func(SUM, item)
Please avoid doing something like this on big programs where you do more advanced things on a function this could increase up timings..

Prefer to adapt your function to feet your needs, and make it happen in one single call:
Code: [Select]
def func(ls):
    sum = 0
    for item in ls:
        sum+=item
    return sum

ls = [1,2,3,4]
sum_ls = func(ls)
This is much more decent. Avoiding massive function calls when you are able can cause you much more stability and fairly better timings.



TECHNIQUES


Technique Number 0.
Function to find and return current free RAM space as int(Linux Only).

This technique require some basic Bash Scripting knowledge.
Code: [Select]
## Importing
from os import popen


## Function to find current available RAM
## and return it as an int.
def free_memory_finder():

    ## Find total memory
    total = popen("free -t | grep Mem: | awk '{print $2}'")
    ## Find cached memory
    cached = popen("free -t | grep '-' | awk '{print $3}'")
    ## Make them integers
    total_memory = int(total.read())
    cached_memory = int(cached.read())
    ## Calculate the free memory
    free_memory = total_memory - cached_memory
    return free_memory
**NOTE --> The int number that is returned is on KB.

Explantion image, executing the commands one by one on terminal.




Technique Number 1.
Function to prepare geometry for a Tkinter.Tk window to be on the center of the screen.

This function will bring the Tkinter to the center of the screen when the mainloop() will be casted.
Code: [Select]
## Function to set the Tk window on the center
def center_widget(root, width, height):
    """this function is reffered only on tkinter widgets,
    it centerize to the screen the main widget"""
    ## Setting the coordinates to add on geometry set up
    x = (root.winfo_screenwidth() / 2) - (width / 2)
    y = (root.winfo_screenheight() / 2) - (height/ 2)
    ## Setting the correct geometry
    root.geometry('{0}x{1}+{2}+{3}'.format(width, height, int(x), int(y)))

Then if you run something like...

Code: [Select]
from Tkinter import Tk
root = Tk()
center_windget(root, 480, 320)
root.mainloop()


You will probably get something like...





Technique Number 2.
Function that returns pretty xml output.

This function takes an implemented xml.etree.ElementTree.Element and returns a nice output using xml.dom.minidom.parseString class and a method of it.
Code: [Select]
## Importing
from xml.dom.minidom import parseString
from xml.etree.ElementTree import tostring


def unglyxml_to_prettyxml(elem):
    """Function to make the xml output good looking,
    using xml.etree via xml.dom method"""
    ## Taking the whole elem to a string
    current_elem_string = tostring(elem, 'utf-8')
    ## Parse it to the minidom
    parsed_on_minidom = parseString(current_elem_string)
    ## Return it prettier ( on the usual xml view )
    return parsed_on_minidom.toprettyxml(indent="\t")
It really help outs those who want to use xml.etree way for xml handling...

I also made an example so you can really understand why its helpful.

Lets create an xml element using xml.etree module:
Code: [Select]
## Importing
from xml.etree import ElementTree
from xml.etree.ElementTree import Element, SubElement

## Setting the root element
root = Element("students")

## Initialize and create subelements
## for first subelement of the root
subel0 = SubElement(root, "Class_A")
SubElement(subel0, 'class_A_student', name = 'George Clinton')
SubElement(subel0, 'class_A_student', name = 'Peter Suarez')

## Same for the second SubEle
subel1 = SubElement(root, "Class_B")
SubElement(subel1, 'class_A_student', name = 'Hillary Clinton')
SubElement(subel1, 'class_A_student', name = 'Luis Suarez')
Thats a simple element with some subelements on xml.etree .

Now lets try print it out and see what we get:
Code: [Select]
## Converting the element to string object
xml_string = ElementTree.tostring(root)

## Print it out
print(xml_string)
Done with converting and printing.

And this is what we get...

You can see, how bad looking it is...just look at the bar its a huge single string.


Now if we import and use the function I have mentioned on that Technique Topic:
Code: [Select]
## You will adapt the import statement, to fit you needs
from Pretty_XML_output import unglyxml_to_prettyxml
print(unglyxml_to_prettyxml(root))

Then this is what we get now:


**NOTE -->It is recommended though to use lxml external module than xml both for xml and html parsing. Even though the lxml is external and not in standard library it proves to be both faster and more flexible. Plus the lxml provides a pretty_print boolean argument which if is set to True it makes available the functionallity I've shown here with the unglyxml_to_prettyxml function for more info visit http://lxml.de/ . I keep this technique for anyone that still wants to use xml module for some weird reasons, when I wrote it back in time I didn't know about lxml goodies too.



Technique Number 3.
Usefull web-crawling function.

This is a function which use lxml to crawl specified tags' text from a specified website and return a list of all the final contents it found or False for empty list.
Code: [Select]
## Importing
from urllib2 import urlopen, Request
from lxml.html import parse


THE_HEADERS = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64)',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

def tagListCreate(website, tag):
    """Function to crawl text of given tags of given website.
    :return A list with all the crawled content"""

    ## Implenenting
    cont_list = []
    request = Request(website, headers=THE_HEADERS)
    site = urlopen(request)
    Hcode = parse(site)

    ## Getting all the tag contents
    append = cont_list.append
    for item in Hcode.iter(tag):
        if item.text_content():
            append(item.text_content())
        else:
            continue

    ## Check and return
    if list_content:
        return list_content
    else:
        return False
**NOTE --> The headers used here shouldn't necessarily be the same, you can customize the constant to fit your needs.

Now running something like this:
Code: [Select]
from pprint import pprint
pprint(tagListCreate('http://www.twitter.com', 'h2'))

You will get this sample output:



Thats all I have for now.
Also make sure you take a look at the Docstrings Conventions PEP.
Its worth the time if you are a beginner.
Okay guys I hope you liked that optimizations/techniques thread of mine. I also hope that my crappy english didn't bother you too much  8) , please make a post if you are a PYTHONISTAZ and you have any suggestions, corrections, dislikes that would be very helpful for me.

With Respect For EvilZone, L0aD1nG.

This Thread Will Be Updated Over Time