Hello EvilZone,
This is a little comparison of wordgenerator codes using three different approaches: Python generators (using yield, C# also has this concept), Scala Streams and a selfwritten Java generator. I plan on adding a version for Haskell which makes use of infinite lists.
I already posted the Java code and explanation here:
http://evilzone.org/tutorials/%28tut%29-create-a-wordlist-generator-%28i-e-for-bruteforcing%29/msg36286/#msg36286If you have trouble understanding how the code works, I suggest you read the Java tutorial.
The basic task is: Take in an alphabet and a wordlength and put out all possible words that can be created.
Java
Java was the least convenient language for this task. It doesn't provide any features that make it easy to generate or handle a huge amount of data on the fly. So I had to come up with implementing my own generator. Note that this code differs a bit from the tutorial posted above to provide more flexibility.
What is a generator? Generators can iterate over items only once. They do not store any of the items, but they compute the items on the fly instead. The following Java implementation uses the "wordNumber" to keep track of the state. Every generated word has its own number up to maxWords. The words are generated based on that number.
public class WordGenerator {
private int wordNumber;
private final int wordlength;
private final char[] alphabet;
private final long maxWords;
private final int radix;
/**
* Inits a wordlist generator with given alphabet and wordlength
* @param alphabet
* @param wordlength
*/
public WordGenerator(char[] alphabet, int wordlength) {
this.wordlength = wordlength;
this.alphabet = alphabet;
this.maxWords = (long) Math.pow(alphabet.length, wordlength);
this.radix = alphabet.length;
}
/**
*
* @return next generated word, null if no word is left
*/
public synchronized String generateNext() {
if (hasNext()) {
int[] indices = convertToRadix(wordNumber);
char[] word = new char[wordlength];
for (int k = 0; k < wordlength; k++) {
word[k] = alphabet[indices[k]];
}
wordNumber++;
return new String(word);
}
return null;
}
/**
*
* @return true if there are more words to generate, false otherwise
*/
public boolean hasNext() {
return (wordNumber < maxWords);
}
private int[] convertToRadix(long number) {
int[] indices = new int[wordlength];
for (int i = wordlength - 1; i >= 0; i--) {
if (number > 0) {
int rest = (int) (number % radix);
number /= radix;
indices[i] = rest;
} else {
indices[i] = 0;
}
}
return indices;
}
/**
* Shortcut to generate alphabet ASCII ranges
* @param start
* @param end
* @return
*/
public static char[] initAllowedCharacters(int start, int end) {
char[] allowedCharacters = new char[end - start + 1];
for (int i = start; i <= end; i++) {
allowedCharacters[i - start] = (char) i;
}
return allowedCharacters;
}
}
Usage of the class:
public static void main(String[] args) {
char[] alphabet = initAllowedCharacters('a', 'z');
int wordlength = 3;
WordGenerator gen = new WordGenerator(alphabet, wordlength);
while(gen.hasNext()) {
System.out.println(gen.generateNext());
}
}
PythonPython features coroutines, making it possible to create generators like the above in a much easier way by using yield.
yield creates a generator.
In addition Python has list comprehensions, which I made use of here too.
Here is the same word generator with Python:
def convert_to_radix(number, wordlength, radix):
indices = []
for i in xrange(wordlength):
if number > 0:
rest = number % radix
number /= radix
indices.append(rest)
else:
indices.append(0)
return indices
def word_gen(alphabet, wordlength):
MAXWORDS = len(alphabet) ** wordlength
RADIX = len(alphabet)
for k in xrange(MAXWORDS):
indices = convert_to_radix(k, wordlength, RADIX)
word = [alphabet[indices[i]] for i in xrange(wordlength)]
yield word
Example usage:
alphabet = ['a', 'b', 'c']
for word in word_gen(alphabet, 4):
print ''.join(word)
Note: I will edit Python docstrings later.
ScalaWhile Python has some functional capabilities, Scala can be used purely functional. Scala is a true imperative-functional-Hybrid.
It doesn't feature infinite lists like Haskell, but it has the concept of Streams instead. You predefine a Stream and the contents are only computed if you actually access them.
I made this version as functional as possible, but you could as well just code it like the Java version.
Note that the yield keyword in Scala is something entirely different than in Python. It is used, i.e. for list comprehensions. Example:
This is a python list comprehension creating numbers from 0 to 9:
numbers = [x for x in range(0,10)]
This is the same in Scala:
numbers = for (x <- 0 until 10) yield x
/**
* Generator that constructs all possible words for a given alphabet
* and wordlength.
*
* @constructor
* @param alphabet The alphabet that is used for constructing the words
* @param wordlength The length of the words that shall be generated
*/
class WordGenerator (alphabet: Array[Char], wordlength: Int) {
private val MAXWORDS = math.round(math.pow(alphabet.length, wordlength))
private val RADIX = alphabet.length
val words = wordStream(0)
/**
*Generates a wordstream of all possible words
*
*@param n number of words already generated
*@return stream that generates all possible words
*/
private def wordStream(n: Int): Stream[String] = {
n match {
case MAXWORDS => Stream.empty
case _ => val indices = convertToRadix(n)
(
for( k <- 0 until wordlength)
yield alphabet(indices(k))
).mkString("") #:: wordStream(n+1)
}
}
/**
* Converts a given number to the radix that was computed out of
* the alphabet's length.
*
* Note that this won't use the common letter representation
* associated with i.e. radix 16 (hex). Instead of 0-9,A-F
* the integers 0-15 are used.
*
* @param number the number to be converted
* @return List with integers that represent the digits of the converted number
*/
private def convertToRadix(number: Long): List[Int] =
(number match {
case 0 => Nil
case _ => (number % RADIX).toInt :: convertToRadix(number / RADIX)
} ) padTo(wordlength, 0)
}
Example usage (here using an companion object that contains the main function):
object WordGenerator {
val USAGE = "WordGenerator <alphabet> <wordlength>"
def main(args: Array[String]) = {
if(args.length == 2) {
val alphabet: Array[Char] = args(0).toCharArray
val wordlength = args(1).toInt
val gen = new WordGenerator(alphabet, wordlength)
gen.words.foreach(s => print(s + ", "))
} else {
println(USAGE)
}
}
/**
* A convenience method to create an alphabet for word generation.
*
* The alphabet is constructed within the range of a given start
* character and end character (inclusive).
*
* @param start the first character created for the range
* @param end the last character (inclusive) created for the range
* @return char array that holds the constructed alphabet
*/
def initAllowedCharacters(start: Int, end: Int): Array[Char] =
(for ( i <- start to end )
yield i.toChar
) toArray
}
To sum it up: Python and Scala make it really convenient to create generators, while Python is the most easiest to understand (in my opinion). A lot of lines of code are saved by using the functional capabilities of Python and Scala.
As a comparison: (using the tool cloc)
Python: 17 lines of code (no classes used)
Scala: 20 lines of code (class used)
Java: 42 lines of code (without initAllowedCharacters, which I omitted in the other samples as well)
Note that Scala (like Python) can be used without putting a class around it, which makes the number of lines almost even to Python. Or vice versa: Put a class around the Python version and you get 19 lines of code.
The only language left is Haskell, which probably will be even more concise. I will update this thread once I made a Haskell version.
What is interesting about Haskell? Is uses lazy evaluation, making it possible to define infinite lists.
This simple Haskell example constructs a list containing numbers from 1 to infinity:
infinitelist = [1..]
Have fun coding
Deque