EvilZone

Programming and Scripting => Scripting Languages => : Matriplex October 04, 2013, 11:18:57 PM

: Python Port of Word List Generator
: Matriplex October 04, 2013, 11:18:57 PM
I decided to learn Python because I was bored last weekend and have actually been really amazed at the power of it! To test my (elementary) skills with Python I decided to try and port the Java Word List Generator found here: http://evilzone.org/tutorials/%28tut%29-create-a-wordlist-generator-%28i-e-for-bruteforcing%29 (http://evilzone.org/tutorials/%28tut%29-create-a-wordlist-generator-%28i-e-for-bruteforcing%29)

I think I did alright. The end file size of a 4 char long run was 8.4 megs, however when I pushed it to 5 characters it froze my computer for about 2 minutes and produced a 343 megabyte file...
I don't mean to brag, but I do have a pretty decent computer so I am probably doing something wrong with this :P

Anyways, here is the code:
:
import math
import sys
import os

def generate(am):
    pos = '/home/rotc/Documents/list.txt'
    pathdir = '/home/rotc/Documents'
    print os.listdir(pathdir)
    print "Opening the file at " + pos
    f = open(pos, "w")
    print "File opened."
    wordLength = am;
    alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']
    radix = len(alphabet)
   
    MAX_WORDS = float(math.pow(float(len(alphabet)), float(wordLength)))
    print "Writing..."
    for i in range(0, int(MAX_WORDS)):
        indices = convertToRadix(radix, i, wordLength)
        word = [1] * int(wordLength)
        for k in range(0, int(wordLength)):
            word[k] = alphabet[indices[k]]
           
        #print ''.join(word)
        write(word, f)
    print "Finished writing. \nFlushing..."
    f.flush()
    print "Finished Flushing..."
    print "Closing File..."
    f.close()
    print "File Closed."

def write(word, f):
    f.write(''.join(word) + "\n")

def convertToRadix(radix, number, wordLength):
    result = [1] * int(wordLength)
    for i in reversed(xrange(int(wordLength))):
        if number > 0:
            rest = number % radix
            number /= radix
            result[i] = rest
        else:
            result[i] = 0
    return result

amount = raw_input("How many letters long: ")
generate(amount);

If you see any obvious performance killers let me know please, but for now thanks for reading.
: Re: Python Port of Word List Generator
: Kulverstukas October 05, 2013, 06:47:13 AM
The code to me looks clean. However python itself is not really good with such computations so I would blame python for being slow.
: Re: Python Port of Word List Generator
: Deque October 05, 2013, 01:36:38 PM
The problem with Python: It has immutable strings (like Java). Note that I used a char array to produce the words in the Java code. You are using strings and do a lot of string operations, like this one:

word = [1] * int(wordLength) --> this isn't necessary btw.

Immutability means, you can't change a string object. So everytime you do a string operation, a new object is created, which is costly.

I didn't test your code, so I can't say this is the reason for sure. But you might test my assumption by going a different direction with your code.

Look here for for more information about string operations and their performance in Python: http://www.skymind.com/~ocrow/python_string/ (http://www.skymind.com/~ocrow/python_string/)
https://wiki.python.org/moin/PythonSpeed/PerformanceTips

I also did a port of my code to python, but never tested it for performance. You might try this one as well:

:
def convert_to_radix(number, wordlength, radix):
    indices = []
    for i in xrange(wordlength):
        if number > 0:
            rest = number % radix
            number /=  radix
            indices.append(rest)
        else:
            indices.append(0)
    return indices

def word_gen(alphabet, wordlength):
    MAXWORDS = len(alphabet) ** wordlength
    RADIX = len(alphabet)
    for k in xrange(MAXWORDS):
        indices = convert_to_radix(k, wordlength, RADIX)
        word = [alphabet[indices[i]] for i in xrange(wordlength)]
        yield word

Example usage:

:
alphabet = ['a', 'b', 'c']
for word in word_gen(alphabet, 4):
  print ''.join(word)

Also: did you verify that your code produces correct results?
: Re: Python Port of Word List Generator
: Matriplex October 05, 2013, 05:58:55 PM
Oh that makes sense Deque! Well nuts, I thought that it wouldn't create a new object. I'll check out those links and see if I can get performance any better.
And yes, the code produces perfect results :). After coding this I researched the performance issue a little more and concluded that, as you said, Java or C++ would be better for this because they can handle memory better since they are a compiled language.
Thanks!
: Re: Python Port of Word List Generator
: ande October 05, 2013, 07:17:56 PM
The big performance killer here is not memory management. The slowest part here is the to-disk writing. You should consider buffering to memory and write less often. Like let it buffer 100k words and then write to file. That way you save a lot of IO traffic.
: Re: Python Port of Word List Generator
: madara October 08, 2013, 11:35:33 PM
Python is Python because isn't  php or  java....Please..


:
#Python3


def product(*args, **karg):
    pools    = list(map(tuple, args)) * karg.get('repeat', 1) #list because in Python3 map return a gen.
    acc      = [[]]
    for pool in pools:
        acc  = (x+[y] for x in acc for y in pool) #generator expr.  save memory
    for prod in acc:
        yield tuple(prod)


for x in product(['a', 'b', 'c'], repeat=5):
    print(''.join(x))





or simply use plug and play *product* iterator from *itertools*; look at official documentation
: Re: Python Port of Word List Generator
: madara October 09, 2013, 12:19:19 PM
The code to me looks clean. However python itself is not really good with such computations so I would blame python for being slow.


Mr. Admin  Kulverstukas
ok You deleted my last post, because does not have a pleasing my comment on your post..(is this  your rules of behavior  for *create a forum worthy of being visited?????*...MEMBERS I LEAVE YOU A RESPONSE...)...
congratulations but  this isn't  the way!!!
comparison between people..that's the way.


Eventually it's ok no problem, but Please deleted also your post because you do not know what you're talking...and GIVE  INCORRECT INFORMATION;
 otherwise You have to justify what You said...we are waiting
ps. sorry to all for my bad english...uhhhh  ;)


Regards


madara
: Re: Python Port of Word List Generator
: ande October 09, 2013, 05:42:53 PM

Mr. Admin  Kulverstukas
ok You deleted my last post, because does not have a pleasing my comment on your post..(is this  your rules of behavior  for *create a forum worthy of being visited?????*...MEMBERS I LEAVE YOU A RESPONSE...)...
congratulations but  this isn't  the way!!!
comparison between people..that's the way.


Eventually it's ok no problem, but Please deleted also your post because you do not know what you're talking...and GIVE  INCORRECT INFORMATION;
 otherwise You have to justify what You said...we are waiting
ps. sorry to all for my bad english...uhhhh  ;)


Regards


madara

Actually I deleted your post. It only said "PLEASE......", which is completely useless. On top of that it was a double post. Get your own information and structure right before you bash on someone else's.
: Re: Python Port of Word List Generator
: Phage October 09, 2013, 08:20:07 PM
Second, Kulvertukas is actually right. Python is known (it's a fact) for being slower than other languages, you can't deny that!
: Re: Python Port of Word List Generator
: Kulverstukas October 09, 2013, 08:40:47 PM
Why is it that I am always the one to blame when posts get deleted? :P if they are deleted, there was a reason for it. Deal with it or gtfo, honestly. Staff does not remove posts "just because...".

@madara: my facts are straight, while your post is curved...
: Re: Python Port of Word List Generator
: Phage October 09, 2013, 09:07:17 PM
It's probably because you are the only admin that post regularly on the boards. 
: Re: Python Port of Word List Generator
: madara October 09, 2013, 09:56:03 PM

 Actually I deleted your post. It only said "PLEASE......", which is completely useless.
 

my post was an implicit  *comment* (too complicated to understand?..)
""get your own informaton before....ecc..."""
ok, i'm agree  --> sorry to Kulverstukas for that...but Kulverstukas...
when you say "my post ar right and your curved"...You confirm that you do not know the subject..Please learn before comment

 Second, Kulvertukas is actually right. Python is known (it's a fact) for being slower than other languages, you can't deny that!
Phage obviously Python is slower then a compiled language, such C ,, CLisp & company but also C/C++ are slower then Assembly, or CLR…then you prefer write your SW in pure machine language?..perhaps for a rootkit…in this case an assembly C  inline only to access specific CPU registers…but only small chunks.

eventually in this case   when You use *itertools*'s  methods  for simple or complex iteration, speed will not be a your problem…make your tests guy ;D
Thanks to all!
: Re: Python Port of Word List Generator
: Phage October 09, 2013, 10:03:02 PM
You clearly have a problem with respect for other people and authorities. My tests? Tests on what? I know Python pretty well, I know up's and down's about it. But your types of words makes me second your knowledge? You act like it's a war between you and the rest of the community. It's not. Chill out bro! I should lead you to the ganja shop!
: Re: Python Port of Word List Generator
: madara October 09, 2013, 10:24:05 PM
I should lead you to the ganja shop!
can be an idea  ;D


ok, I have my ways that may seem uncomfortable but I  always say what I think but this is not a lack of respect...
remeber that I do not judge people, but what they say...it's very different...


we are here to learn not for the glory and touchiness does not allow the evolution , is it? ;)
: Re: Python Port of Word List Generator
: Deque October 10, 2013, 01:36:37 PM
Madara: I see a lot of good stuff in your posts, but I feel you fail to explain them, which makes them seem pretty much respectless or useless or both.

I.e. you bash at Kulvers post, but give no arguments why it is wrong in your eyes.
You say "Please, Python is not Java", but don't explain anything. Be so kind and explain your thoughts to others and people will see it as a chance to either agree with you and get better by learning from your posts or they will disagree and may give you the opportunity to learn by their own arguments.

I think you miss the point with advising itertools. The original question was about why the program is slow, not how to implement the same functionality in another way. Yes, it is shorter if you use this, but no, the TO won't learn why there is a problem in his code if you just smash your own code at him without any explanation (other than "Python is not [insert your language here]").

Furthermore:
Python is compiled, at least in its probably most used implementation which is CPython. Even if you type python myprogram.py it will first be compiled to bytecode and this bytecode is interpreted afterwards (in case of CPython).
For Java it is the same, but there is more optimization going on and it uses JIT (compilation to native code at runtime). That makes a real boost in performance and can in some cases outperform a purley compiled language implementation, because it is able to adapt the compilation at runtime as well based on the things that happened in the past. As CPython doesn't use JIT by now, it is indeed slow.

But in most cases not that slow that it matters and in the case of Matriplex' code there is another underlying problem (I believe ande got his finger on the right spot).

when you say "my post ar right and your curved"...You confirm that you do not know the subject..Please learn before comment

Kulver didn't say he was right, he said his comment is straight (to the point), while yours isn't and I have to agree with him.

You confirm that you do not know the subject..Please learn before comment

So far you didn't confirm either that you know the subject. Do you see why this is respectless?

Phage obviously Python is slower then a compiled language, such C ,, CLisp & company but also C/C++ are slower then Assembly, or CLR…then you prefer write your SW in pure machine language?..perhaps for a rootkit…in this case an assembly C  inline only to access specific CPU registers…but only small chunks.

eventually in this case   when You use *itertools*'s  methods  for simple or complex iteration, speed will not be a your problem…make your tests guy

So, your argument is: Yes, it is slower, but no, it is not slower.
Maybe you meant it right, but the way you write it *facepalm*
itertools is written in C. Testing it is no proof that Python is fast.
: Re: Python Port of Word List Generator
: madara October 10, 2013, 05:13:45 PM
HI Deque , good speech 8) 
you’re right, I meant compilation  directly in machine language like in C/C++..ecc..not in IL..
I repeat: obviously Python is slower then *pure compiled* languages , and ok its VM  hasn’t JIT like in Java or in .net runtime(ps. however look at PyPy  just in time compiler for Python...good but not comparable to JIT)..…but it take advantage of powerful  libraries like  itertools,  numpy package and many others as you know…there would also *stuff* like  Cython and Numba (for scientific computations) to speed up your code… all this is Python;
but in conclusion  what matters is muster an  efficient practice   coding as you said

Semantic of code   that I posted is very simple, and self –explanation,.. obviously  he  must know WELL built in  func and methods like map, expression gen, list comp., ecc… If not à  ther’s documentations and good books  for that.You can’t write good code without know basis…in a boring week end ... :D
it’s also important to think over  simple or complex code with own head, and after he could ask for clarifications

Why Python isn’t java or php..?;

Because if You think in Java when write Python's code probably you’ll produce inefficient code…
If you think in php when write python or php's code : surley  you’ll produce very inefficient  code  ;D
: Re: Python Port of Word List Generator
: Matriplex October 14, 2013, 03:59:56 AM
Guys, I appreciate your righting his wrongs, but maybe do it in a private message? Really, I'm glad you're telling him off for not explaining don't take me for a complaining welp.

I definitely agree, Python is nowhere near as fast as Java and is a completely different language, however this was a complete learning experience. I'm glad you guys are pointing all this out it's really helping me. Also, I was wondering what you guys would use to write a simple web/computer crawler? And is there a difference between spiders and crawlers or do they have different names but are the exact same? Thanks.
: Re: Python Port of Word List Generator
: Phage October 14, 2013, 10:15:09 AM
You're asking a whole new question which doesn't have anything to to with the original thread. Please make a new thread of you want an answer on your question.