Author Topic: [Python, Scala, Java] Wordgenerator (bruteforce snippets)  (Read 1910 times)

0 Members and 1 Guest are viewing this topic.

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
[Python, Scala, Java] Wordgenerator (bruteforce snippets)
« on: June 25, 2013, 09:49:57 pm »
Hello EvilZone,

This is a little comparison of wordgenerator codes using three different approaches: Python generators (using yield, C# also has this concept),  Scala Streams and a selfwritten Java generator. I plan on adding a version for Haskell which makes use of infinite lists.

I already posted the Java code and explanation here: http://evilzone.org/tutorials/%28tut%29-create-a-wordlist-generator-%28i-e-for-bruteforcing%29/msg36286/#msg36286
If you have trouble understanding how the code works, I suggest you read the Java tutorial.

The basic task is: Take in an alphabet and a wordlength and put out all possible words that can be created.

Java

Java was the least convenient language for this task. It doesn't provide any features that make it easy to generate or handle a huge amount of data on the fly. So I had to come up with implementing my own generator. Note that this code differs a bit from the tutorial posted above to provide more flexibility.

What is a generator? Generators can iterate over items only once. They do not store any of the items, but they compute the items on the fly instead. The following Java implementation uses the "wordNumber" to keep track of the state. Every generated word has its own number up to maxWords. The words are generated based on that number.

Code: [Select]
public class WordGenerator {

    private int wordNumber;
    private final int wordlength;
    private final char[] alphabet;
    private final long maxWords;
    private final int radix;

    /**
     * Inits a wordlist generator with given alphabet and wordlength
     * @param alphabet
     * @param wordlength
     */
    public WordGenerator(char[] alphabet, int wordlength) {
        this.wordlength = wordlength;
        this.alphabet = alphabet;
        this.maxWords = (long) Math.pow(alphabet.length, wordlength);
        this.radix = alphabet.length;
    }

    /**
     *
     * @return next generated word, null if no word is left
     */
    public synchronized String generateNext() {
        if (hasNext()) {
            int[] indices = convertToRadix(wordNumber);
            char[] word = new char[wordlength];
            for (int k = 0; k < wordlength; k++) {
                word[k] = alphabet[indices[k]];
            }
            wordNumber++;
            return new String(word);
        }
        return null;
    }

    /**
     *
     * @return true if there are more words to generate, false otherwise
     */
    public boolean hasNext() {
        return (wordNumber < maxWords);
    }

    private int[] convertToRadix(long number) {
        int[] indices = new int[wordlength];
        for (int i = wordlength - 1; i >= 0; i--) {
            if (number > 0) {
                int rest = (int) (number % radix);
                number /= radix;
                indices[i] = rest;
            } else {
                indices[i] = 0;
            }

        }
        return indices;
    }

    /**
     * Shortcut to generate alphabet ASCII ranges
     * @param start
     * @param end
     * @return
     */
    public static char[] initAllowedCharacters(int start, int end) {
        char[] allowedCharacters = new char[end - start + 1];
        for (int i = start; i <= end; i++) {
            allowedCharacters[i - start] = (char) i;
        }
        return allowedCharacters;
    }

}

Usage of the class:

Code: [Select]
public static void main(String[] args) {
    char[] alphabet = initAllowedCharacters('a', 'z');
    int wordlength = 3;
    WordGenerator gen = new WordGenerator(alphabet, wordlength);
    while(gen.hasNext()) {
        System.out.println(gen.generateNext());
    }
}

Python

Python features coroutines, making it possible to create generators like the above in a much easier way by using yield.
yield creates a generator.
In addition Python has list comprehensions, which I made use of here too.

Here is the same word generator with Python:

Code: [Select]
def convert_to_radix(number, wordlength, radix):
    indices = []
    for i in xrange(wordlength):
        if number > 0:
            rest = number % radix
            number /=  radix
            indices.append(rest)
        else:
            indices.append(0)
    return indices

def word_gen(alphabet, wordlength):
    MAXWORDS = len(alphabet) ** wordlength
    RADIX = len(alphabet)
    for k in xrange(MAXWORDS):
        indices = convert_to_radix(k, wordlength, RADIX)
        word = [alphabet[indices[i]] for i in xrange(wordlength)]
        yield word

Example usage:

Code: [Select]
alphabet = ['a', 'b', 'c']
for word in word_gen(alphabet, 4):
  print ''.join(word)

Note: I will edit Python docstrings later.

Scala

While Python has some functional capabilities, Scala can be used purely functional. Scala is a true imperative-functional-Hybrid.
It doesn't feature infinite lists like Haskell, but it has the concept of Streams instead. You predefine a Stream and the contents are only computed if you actually access them.

I made this version as functional as possible, but you could as well just code it like the Java version.

Note that the yield keyword in Scala is something entirely different than in Python. It is used, i.e. for list comprehensions. Example:
This is a python list comprehension creating numbers from 0 to 9:

Code: [Select]
numbers = [x for x in range(0,10)]
This is the same in Scala:

Code: [Select]
numbers = for (x <- 0 until 10) yield x
Code: [Select]
/**
 * Generator that constructs all possible words for a given alphabet
 * and wordlength.
 *
 * @constructor
 * @param alphabet The alphabet that is used for constructing the words
 * @param wordlength The length of the words that shall be generated
 */
class WordGenerator (alphabet: Array[Char], wordlength: Int) {

  private val MAXWORDS = math.round(math.pow(alphabet.length, wordlength))
  private val RADIX = alphabet.length

  val words = wordStream(0)

  /**
   *Generates a wordstream of all possible words
   *
   *@param n number of words already generated
   *@return stream that generates all possible words
   */
  private def wordStream(n: Int): Stream[String] = {
    n match {
      case MAXWORDS => Stream.empty
      case _ => val indices = convertToRadix(n)
        (
          for( k <- 0 until wordlength)
            yield alphabet(indices(k))
        ).mkString("") #:: wordStream(n+1)
    }
  }
 
   /**
   * Converts a given number to the radix that was computed out of
   * the alphabet's length.
   *
   * Note that this won't use the common letter representation       
   * associated with i.e. radix 16 (hex). Instead of 0-9,A-F
   * the integers 0-15 are used.
   *
   * @param number the number to be converted
   * @return List with integers that represent the digits of the converted number 
   */
  private def convertToRadix(number: Long): List[Int] =
    (number match {
      case 0 => Nil
      case _ => (number % RADIX).toInt :: convertToRadix(number / RADIX)
    } ) padTo(wordlength, 0)
}

Example usage (here using an companion object that contains the main function):

Code: [Select]
object WordGenerator {

  val USAGE = "WordGenerator <alphabet> <wordlength>"

  def main(args: Array[String]) = {
    if(args.length == 2) {
      val alphabet: Array[Char] = args(0).toCharArray
      val wordlength = args(1).toInt
      val gen = new WordGenerator(alphabet, wordlength)
      gen.words.foreach(s => print(s + ", "))
    } else {
      println(USAGE)
    }
  }

  /**
   * A convenience method to create an alphabet for word generation.
   *
   * The alphabet is constructed within the range of a given start
   * character and end character (inclusive).
   *
   * @param start the first character created for the range
   * @param end the last character (inclusive) created for the range
   * @return char array that holds the constructed alphabet
   */
  def initAllowedCharacters(start: Int, end: Int): Array[Char] =   
    (for ( i <- start to end )
      yield i.toChar
    ) toArray
}

To sum it up: Python and Scala make it really convenient to create generators, while Python is the most easiest to understand (in my opinion). A lot of lines of code are saved by using the functional capabilities of Python and Scala.
As a comparison: (using the tool cloc)
Python: 17 lines of code (no classes used)
Scala: 20 lines of code (class used)
Java: 42 lines of code (without initAllowedCharacters, which I omitted in the other samples as well)

Note that Scala (like Python) can be used without putting a class around it, which makes the number of lines almost even to Python. Or vice versa: Put a class around the Python version and you get 19 lines of code.

The only language left is Haskell, which probably will be even more concise. I will update this thread once I made a Haskell version.
What is interesting about Haskell? Is uses lazy evaluation, making it possible to define infinite lists.

This simple Haskell example constructs a list containing numbers from 1 to infinity:

Code: [Select]
infinitelist = [1..]
Have fun coding

Deque
« Last Edit: June 25, 2013, 09:50:25 pm by Deque »

Offline madara

  • Serf
  • *
  • Posts: 23
  • Cookies: 3
    • View Profile
Re: [Python, Scala, Java] Wordgenerator (bruteforce snippets)
« Reply #1 on: June 26, 2013, 07:12:42 pm »
HI Deque..
very good tutorial on generators behavior in different languages...!!
the next step in Python is to take advantage of its 'powerful' core library:
in fact  we can achieve the same goal  faster with itertools' combinatoric generators:

Code: [Select]

from itertools import product
dictionary = ['a','b','c','d','e','f','g']
pass_gen = product(dictionary, repeat = 3) #pass_gen return a generator;
                                                              # with repeat we can set  word width
for pwd in pass_gen:
   print("".join(pwd))




Thanks for good tutorial ;) 



Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: [Python, Scala, Java] Wordgenerator (bruteforce snippets)
« Reply #2 on: June 26, 2013, 08:17:45 pm »
Thank you for the hint. That's a good advice.