EvilZone
Programming and Scripting => Scripting Languages => : Matriplex October 04, 2013, 11:18:57 PM
-
I decided to learn Python because I was bored last weekend and have actually been really amazed at the power of it! To test my (elementary) skills with Python I decided to try and port the Java Word List Generator found here: http://evilzone.org/tutorials/%28tut%29-create-a-wordlist-generator-%28i-e-for-bruteforcing%29 (http://evilzone.org/tutorials/%28tut%29-create-a-wordlist-generator-%28i-e-for-bruteforcing%29)
I think I did alright. The end file size of a 4 char long run was 8.4 megs, however when I pushed it to 5 characters it froze my computer for about 2 minutes and produced a 343 megabyte file...
I don't mean to brag, but I do have a pretty decent computer so I am probably doing something wrong with this :P
Anyways, here is the code:
import math
import sys
import os
def generate(am):
pos = '/home/rotc/Documents/list.txt'
pathdir = '/home/rotc/Documents'
print os.listdir(pathdir)
print "Opening the file at " + pos
f = open(pos, "w")
print "File opened."
wordLength = am;
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']
radix = len(alphabet)
MAX_WORDS = float(math.pow(float(len(alphabet)), float(wordLength)))
print "Writing..."
for i in range(0, int(MAX_WORDS)):
indices = convertToRadix(radix, i, wordLength)
word = [1] * int(wordLength)
for k in range(0, int(wordLength)):
word[k] = alphabet[indices[k]]
#print ''.join(word)
write(word, f)
print "Finished writing. \nFlushing..."
f.flush()
print "Finished Flushing..."
print "Closing File..."
f.close()
print "File Closed."
def write(word, f):
f.write(''.join(word) + "\n")
def convertToRadix(radix, number, wordLength):
result = [1] * int(wordLength)
for i in reversed(xrange(int(wordLength))):
if number > 0:
rest = number % radix
number /= radix
result[i] = rest
else:
result[i] = 0
return result
amount = raw_input("How many letters long: ")
generate(amount);
If you see any obvious performance killers let me know please, but for now thanks for reading.
-
The code to me looks clean. However python itself is not really good with such computations so I would blame python for being slow.
-
The problem with Python: It has immutable strings (like Java). Note that I used a char array to produce the words in the Java code. You are using strings and do a lot of string operations, like this one:
word = [1] * int(wordLength) --> this isn't necessary btw.
Immutability means, you can't change a string object. So everytime you do a string operation, a new object is created, which is costly.
I didn't test your code, so I can't say this is the reason for sure. But you might test my assumption by going a different direction with your code.
Look here for for more information about string operations and their performance in Python: http://www.skymind.com/~ocrow/python_string/ (http://www.skymind.com/~ocrow/python_string/)
https://wiki.python.org/moin/PythonSpeed/PerformanceTips
I also did a port of my code to python, but never tested it for performance. You might try this one as well:
def convert_to_radix(number, wordlength, radix):
indices = []
for i in xrange(wordlength):
if number > 0:
rest = number % radix
number /= radix
indices.append(rest)
else:
indices.append(0)
return indices
def word_gen(alphabet, wordlength):
MAXWORDS = len(alphabet) ** wordlength
RADIX = len(alphabet)
for k in xrange(MAXWORDS):
indices = convert_to_radix(k, wordlength, RADIX)
word = [alphabet[indices[i]] for i in xrange(wordlength)]
yield word
Example usage:
alphabet = ['a', 'b', 'c']
for word in word_gen(alphabet, 4):
print ''.join(word)
Also: did you verify that your code produces correct results?
-
Oh that makes sense Deque! Well nuts, I thought that it wouldn't create a new object. I'll check out those links and see if I can get performance any better.
And yes, the code produces perfect results :). After coding this I researched the performance issue a little more and concluded that, as you said, Java or C++ would be better for this because they can handle memory better since they are a compiled language.
Thanks!
-
The big performance killer here is not memory management. The slowest part here is the to-disk writing. You should consider buffering to memory and write less often. Like let it buffer 100k words and then write to file. That way you save a lot of IO traffic.
-
Python is Python because isn't php or java....Please..
#Python3
def product(*args, **karg):
pools = list(map(tuple, args)) * karg.get('repeat', 1) #list because in Python3 map return a gen.
acc = [[]]
for pool in pools:
acc = (x+[y] for x in acc for y in pool) #generator expr. save memory
for prod in acc:
yield tuple(prod)
for x in product(['a', 'b', 'c'], repeat=5):
print(''.join(x))
or simply use plug and play *product* iterator from *itertools*; look at official documentation
-
The code to me looks clean. However python itself is not really good with such computations so I would blame python for being slow.
Mr. Admin Kulverstukas
ok You deleted my last post, because does not have a pleasing my comment on your post..(is this your rules of behavior for *create a forum worthy of being visited?????*...MEMBERS I LEAVE YOU A RESPONSE...)...
congratulations but this isn't the way!!!
comparison between people..that's the way.
Eventually it's ok no problem, but Please deleted also your post because you do not know what you're talking...and GIVE INCORRECT INFORMATION;
otherwise You have to justify what You said...we are waiting
ps. sorry to all for my bad english...uhhhh ;)
Regards
madara
-
Mr. Admin Kulverstukas
ok You deleted my last post, because does not have a pleasing my comment on your post..(is this your rules of behavior for *create a forum worthy of being visited?????*...MEMBERS I LEAVE YOU A RESPONSE...)...
congratulations but this isn't the way!!!
comparison between people..that's the way.
Eventually it's ok no problem, but Please deleted also your post because you do not know what you're talking...and GIVE INCORRECT INFORMATION;
otherwise You have to justify what You said...we are waiting
ps. sorry to all for my bad english...uhhhh ;)
Regards
madara
Actually I deleted your post. It only said "PLEASE......", which is completely useless. On top of that it was a double post. Get your own information and structure right before you bash on someone else's.
-
Second, Kulvertukas is actually right. Python is known (it's a fact) for being slower than other languages, you can't deny that!
-
Why is it that I am always the one to blame when posts get deleted? :P if they are deleted, there was a reason for it. Deal with it or gtfo, honestly. Staff does not remove posts "just because...".
@madara: my facts are straight, while your post is curved...
-
It's probably because you are the only admin that post regularly on the boards.
-
Actually I deleted your post. It only said "PLEASE......", which is completely useless.
my post was an implicit *comment* (too complicated to understand?..)
""get your own informaton before....ecc..."""
ok, i'm agree --> sorry to Kulverstukas for that...but Kulverstukas...
when you say "my post ar right and your curved"...You confirm that you do not know the subject..Please learn before comment
Second, Kulvertukas is actually right. Python is known (it's a fact) for being slower than other languages, you can't deny that!
Phage obviously Python is slower then a compiled language, such C ,, CLisp & company but also C/C++ are slower then Assembly, or CLR…then you prefer write your SW in pure machine language?..perhaps for a rootkit…in this case an assembly C inline only to access specific CPU registers…but only small chunks.
eventually in this case when You use *itertools*'s methods for simple or complex iteration, speed will not be a your problem…make your tests guy ;D
Thanks to all!
-
You clearly have a problem with respect for other people and authorities. My tests? Tests on what? I know Python pretty well, I know up's and down's about it. But your types of words makes me second your knowledge? You act like it's a war between you and the rest of the community. It's not. Chill out bro! I should lead you to the ganja shop!
-
I should lead you to the ganja shop!
can be an idea ;D
ok, I have my ways that may seem uncomfortable but I always say what I think but this is not a lack of respect...
remeber that I do not judge people, but what they say...it's very different...
we are here to learn not for the glory and touchiness does not allow the evolution , is it? ;)
-
Madara: I see a lot of good stuff in your posts, but I feel you fail to explain them, which makes them seem pretty much respectless or useless or both.
I.e. you bash at Kulvers post, but give no arguments why it is wrong in your eyes.
You say "Please, Python is not Java", but don't explain anything. Be so kind and explain your thoughts to others and people will see it as a chance to either agree with you and get better by learning from your posts or they will disagree and may give you the opportunity to learn by their own arguments.
I think you miss the point with advising itertools. The original question was about why the program is slow, not how to implement the same functionality in another way. Yes, it is shorter if you use this, but no, the TO won't learn why there is a problem in his code if you just smash your own code at him without any explanation (other than "Python is not [insert your language here]").
Furthermore:
Python is compiled, at least in its probably most used implementation which is CPython. Even if you type python myprogram.py it will first be compiled to bytecode and this bytecode is interpreted afterwards (in case of CPython).
For Java it is the same, but there is more optimization going on and it uses JIT (compilation to native code at runtime). That makes a real boost in performance and can in some cases outperform a purley compiled language implementation, because it is able to adapt the compilation at runtime as well based on the things that happened in the past. As CPython doesn't use JIT by now, it is indeed slow.
But in most cases not that slow that it matters and in the case of Matriplex' code there is another underlying problem (I believe ande got his finger on the right spot).
when you say "my post ar right and your curved"...You confirm that you do not know the subject..Please learn before comment
Kulver didn't say he was right, he said his comment is straight (to the point), while yours isn't and I have to agree with him.
You confirm that you do not know the subject..Please learn before comment
So far you didn't confirm either that you know the subject. Do you see why this is respectless?
Phage obviously Python is slower then a compiled language, such C ,, CLisp & company but also C/C++ are slower then Assembly, or CLR…then you prefer write your SW in pure machine language?..perhaps for a rootkit…in this case an assembly C inline only to access specific CPU registers…but only small chunks.
eventually in this case when You use *itertools*'s methods for simple or complex iteration, speed will not be a your problem…make your tests guy
So, your argument is: Yes, it is slower, but no, it is not slower.
Maybe you meant it right, but the way you write it *facepalm*
itertools is written in C. Testing it is no proof that Python is fast.
-
HI Deque , good speech 8)
you’re right, I meant compilation directly in machine language like in C/C++..ecc..not in IL..
I repeat: obviously Python is slower then *pure compiled* languages , and ok its VM hasn’t JIT like in Java or in .net runtime(ps. however look at PyPy just in time compiler for Python...good but not comparable to JIT)..…but it take advantage of powerful libraries like itertools, numpy package and many others as you know…there would also *stuff* like Cython and Numba (for scientific computations) to speed up your code… all this is Python;
but in conclusion what matters is muster an efficient practice coding as you said
Semantic of code that I posted is very simple, and self –explanation,.. obviously he must know WELL built in func and methods like map, expression gen, list comp., ecc… If not à ther’s documentations and good books for that.You can’t write good code without know basis…in a boring week end ... :D
it’s also important to think over simple or complex code with own head, and after he could ask for clarifications
Why Python isn’t java or php..?;
Because if You think in Java when write Python's code probably you’ll produce inefficient code…
If you think in php when write python or php's code : surley you’ll produce very inefficient code ;D
-
Guys, I appreciate your righting his wrongs, but maybe do it in a private message? Really, I'm glad you're telling him off for not explaining don't take me for a complaining welp.
I definitely agree, Python is nowhere near as fast as Java and is a completely different language, however this was a complete learning experience. I'm glad you guys are pointing all this out it's really helping me. Also, I was wondering what you guys would use to write a simple web/computer crawler? And is there a difference between spiders and crawlers or do they have different names but are the exact same? Thanks.
-
You're asking a whole new question which doesn't have anything to to with the original thread. Please make a new thread of you want an answer on your question.