Author Topic: [Python] Glibberish-O-Mat  (Read 2050 times)

0 Members and 1 Guest are viewing this topic.

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
[Python] Glibberish-O-Mat
« on: April 03, 2013, 04:46:46 pm »
Have you ever wondered how spam bots create their spam messages?

This is probably how: They use Markov-Chains.
Given one word as input, a Markov Chain will put out another word. The output word is defined by a certain probability. It might be word A with probability of 20% or word B with probability of 80%. This word output is again used as input to get the next word, thus forming a chain of words.

This is my Glibberish-O-Mat that uses this principle. It takes as input a text file (you can use one from Project Gutenberg to get a reasonably long text: http://www.gutenberg.org/ ), analyses the probabilities of the words following other words and then produces a text with the given amount of words using these probabilities.

Usage: glibberish.py <filename> <wordstogenerate>

Example usage: python glibberish.py pg28203.txt 100

Example output:



Python source (Python 2.7):

Code: [Select]
import sys
import operator
import random

markov_chain = {}

#adds the specified word to the markov chain
def add_to_chain(lastword, word):   
    if not markov_chain.has_key(lastword):
        markov_chain[lastword] = {}
    if not markov_chain[lastword].has_key(word):
        markov_chain[lastword][word] = 1
    else:
        markov_chain[lastword][word] += 1

#builds up the markov chain using the specified file
def build_chain_from(filename):
    file = open(filename, 'r')
    lastword = "first"
    for word in words(file):
        add_to_chain(lastword, word)
        lastword = word   

#iterates over words in a file
def words(file):
    for line in file:
        for word in line.split():
            yield word

#returns a random word with the probability based on the given lastword
def get_rand_word(lastword):
    chain = markov_chain[lastword]
    total = sum(chain.itervalues())
    randval = random.randint(1, total)
    for key in chain:
        randval -= chain[key]
        if randval <= 0:
            return key       
    return ""

#generates a text with the given amount of words
def generate_text(amount):
    lastword = "first"
    word = get_rand_word(lastword)
    for i in range(0, amount):
        word = word + " " + get_rand_word(lastword)
    return word   

def print_title():
    print('''
 _____  _  _  _      _                  _       _     
|  __ \| |(_)| |    | |                (_)     | |   
| |  \/| | _ | |__  | |__    ___  _ __  _  ___ | |__ 
| | __ | || || '_ \ | '_ \  / _ \| '__|| |/ __|| '_ \
| |_\ \| || || |_) || |_) ||  __/| |   | |\__ \| | | |
 \____/|_||_||_.__/ |_.__/  \___||_|   |_||___/|_| |_|
          _____         ___  ___        _             
         |  _  |        |  \/  |       | |           
  ______ | | | | ______ | .  . |  __ _ | |_           
 |______|| | | ||______|| |\/| | / _` || __|         
         \ \_/ /        | |  | || (_| || |_           
          \___/         \_|  |_/ \__,_| \__|
''')

def main():
    print_title()
    if len(sys.argv) != 3 or not sys.argv[2].isdigit():
        print "usage: glibberish.py <filename> <wordstogenerate>"       
        return
    filename = sys.argv[1]
    amount = int(sys.argv[2])
   
    build_chain_from(filename)
   
    print generate_text(amount)
    print

main()

Offline Mordred

  • Knight
  • **
  • Posts: 360
  • Cookies: 135
  • Nvllivs in Verba
    • View Profile
Re: [Python] Glibberish-O-Mat
« Reply #1 on: April 03, 2013, 06:16:21 pm »
Very cool! Also the gutenberg link is something I didn't know about, so double kudos! +1
« Last Edit: April 03, 2013, 06:16:29 pm by Mordred »
\x57\x68\x79\x20\x64\x69\x64\x20\x79\x6f\x75\x20\x65\x76\x65\x6e\x20\x66\x75\x63\x6b\x69\x6e\x67\x20\x73\x70\x65\x6e\x64\x20\x74\x68\x65\x20\x74\x69\x6d\x65\x20\x74\x6f\x20\x64\x65\x63\x6f\x64\x65\x20\x74\x68\x69\x73\x20\x6e\x69\x67\x67\x72\x3f\x20\x44\x61\x66\x75\x71\x20\x69\x73\x20\x77\x72\x6f\x6e\x67\x20\x77\x69\x74\x68\x20\x79\x6f\x75\x2e

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: [Python] Glibberish-O-Mat
« Reply #2 on: April 03, 2013, 06:17:33 pm »
Lol nice code, but the output kinda makes no sense :D spam bots probably have more sophisticated methods than that.
+1 anyway

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: [Python] Glibberish-O-Mat
« Reply #3 on: April 03, 2013, 07:58:50 pm »
@Mordred: Thanks and you are welcome.

Lol nice code, but the output kinda makes no sense :D spam bots probably have more sophisticated methods than that.
+1 anyway

It's a glibberish-o-mat. What did you expect?
I often see spam bots that talk glibberish and if they don't, they use databases or crawl the web for texts. They can not generate meaningful text (yet).


Offline 0poitr

  • Peasant
  • *
  • Posts: 149
  • Cookies: 64
    • View Profile
Re: [Python] Glibberish-O-Mat
« Reply #4 on: April 03, 2013, 10:14:28 pm »
lol. Cool stuff. I although, fancy Chomsky.

btw, take a look here: http://pdos.csail.mit.edu/scigen/
Imagination is the first step towards Creation.

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: [Python] Glibberish-O-Mat
« Reply #5 on: April 05, 2013, 01:19:08 pm »
lol. Cool stuff. I although, fancy Chomsky.

btw, take a look here: http://pdos.csail.mit.edu/scigen/

Thanks for that link. A very interesting one. Especially where they try to send in their generated papers to science magazines and list the responses. :P

Offline icon

  • Serf
  • *
  • Posts: 26
  • Cookies: 6
  • Ghost
    • View Profile
Re: [Python] Glibberish-O-Mat
« Reply #6 on: April 13, 2013, 11:17:09 pm »
Good code. Had to read through it a few times. So basically it adds random words from a text file?
De Oppresso Liber

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: [Python] Glibberish-O-Mat
« Reply #7 on: April 14, 2013, 10:58:08 am »
Good code. Had to read through it a few times. So basically it adds random words from a text file?

Yes, but with a certain probability based on the text file.
If you have a word "and" you will only get words, that followed that word in the text file. The more often the other word followed "and" the more likely you will get it as next one.