BinDyn.py - A visual Binary Analysis Script, hacked together by yours truly.
-Finds most common byte and also counts padding (0x00 and 0xff)
-Creates a histogram and prints average byte value
-Creates a Digraph so you can view the implicit relationship between neighboring bytes
-Creates a byteplot
-Creates a self similarity plot
-Can scan the file for signatures. I will NOT reduce the false positive rate because I want to be able to scan large files with resources and ID them, at least half assed reliably, scanning just the header wont do that.
-Lets you dump any strings found in it
-Lets you set start and end points, or rather start point and size, useful for honing in on things
-Creates two maps of the files content, one highlighting printable vs non printable text
-The other showing changes in entropy, very useful for finding enc keys/certs malware stubs, whatevah.
-That's all for now folks, this IS version 0.2 after all
--I'd like to hear any ideas you have :p Im currently toying with ngram analysis as an idea.
'''
Visual Binary Profiler By HTH
This is currently a WIP, that means certain things suck.
I'm aware of this.
Don't mention it.
Please.
Credit to Schalla for the Idea
Credit to Those Awesome guys on Youtube for the inspiration
Credit to some rando who wrote the hilbert curve algorithm in C, which I converted for my own use.
Released under the
"Use it for whatever you want, dont even bother to credit me really,
but dont bitch when it breaks (or isnt up to your standards)" License.
It's a mouthful.
Oh yeah and for now, because of that previously mentioned 0 fucks given, the maps generated by the hilbert curves aren't perfect.
They generate a bunch of blank space as well because the curve is improperly sized, see it expects n to be a factor of 2 and to
remove that blank space id need to froce it to be one... which could cut off a LOT of data.
'''
from PIL import Image, ImageDraw
import sys
import math
from optparse import OptionParser
'''
THese Two Functions are what I was talking about, I made my own space filling curve but it was uhhh, questionable at best.
I decided to leave the math to the math guys ;)
'''
def hilbertify(n, d):
t = d
x = y = 0
s = 1
while (s < n):
rx = 1 & (t / 2)
ry = 1 & (t ^ rx)
x, y = rot(s, x, y, rx, ry)
x += s * rx
y += s * ry
t /= 4
s *= 2
return x, y
def rot(n, x, y, rx, ry):
if ry == 0:
if rx == 1:
x = n - 1 - x
y = n - 1 - y
return y, x
return x, y
'''
Calculates the entropy of a given bunch of text (dose technical terms though..)
'''
def entropy(text):
import math
log2=lambda x:math.log(x)/math.log(2)
exr={}
infoc=0
for each in text:
try:
exr[each]+=1
except:
exr[each] = 1
textlen=len(text)
for k,v in exr.items():
freq = 1.0*v/textlen
infoc+=freq*log2(freq)
infoc*=-1
return infoc
'''
This bad boy plots self similarity but it also creates a 100 mb image if you hand it 10 kb, have a care.
'''
def selfsim(fileobject):
fileobject.seek(0)
digraphImage = Image.new('RGBA', (size,size), "black")
drawingObject = ImageDraw.Draw(digraphImage)
for x in range (0,size):
fileobject.seek(x)
xval = fileobject.read(1)
fileobject.seek(0)
for y in range(size):
yval = fileobject.read(1)
if yval == xval:
drawingObject.point((x,y),(255,0,0))
digraphImage.show()
'''
Uses a user defined window size to calculate the entropy of the text in a sliding window
'''
def entropyMap(fileobject, enlimit, enrange):
fileobject.seek(0)
entropyImage = Image.new('RGBA', (dimension,dimension), "black")
entropyDrawing = ImageDraw.Draw(entropyImage)
for window in range(0,size-enrange):
fileobject.seek(window)
ent = entropy(fileobject.read(enrange))
if ent < enlimit:
entropyDrawing.point(hilbertify(dimension,window),(0,0,int(ent * 20)))
else:
entropyDrawing.point(hilbertify(dimension,window),(255,0,0))
entropyImage.show()
'''
Digraph, creates patterns based upon implicit relationships between neighbouring bytes
Theres lots of patterns if you play around with it.
'''
def digraph(fileobject, brightness):
fileobject.seek(0)
digraphImage = Image.new('RGBA', (256,256), "black")
drawingObject = ImageDraw.Draw(digraphImage)
x = ord(fileobject.read(1))
y = fileobject.read(1)
while y != '' and x != '':
y = ord(y)
r,g,b,a = digraphImage.getpixel((x,y))
drawingObject.point((x,y),(r+brightness,0,0))
x = y
y = fileobject.read(1)
digraphImage.show()
'''
Byte Plot, useful for locating things like bitmaps
'''
def bytePlot(fileobject):
fileobject.seek(0)
digraphImage = Image.new('RGBA', (dimension,dimension), "black")
drawingObject = ImageDraw.Draw(digraphImage)
x = fileobject.read(1)
pos = 1
while x != '':
x = ord(x)
drawingObject.point((pos%dimension,pos/dimension),(x,0,0))
x = fileobject.read(1)
pos += 1
digraphImage.show()
'''
Map of printable charsacters vs chars that are not, be they higher or lower
Had an encryption key show up clear as day here when it wouldnt (easily) in the entropy map
Also cool for locating strings I guess
'''
def printableMap(fileobject):
printableImage = Image.new('RGBA', (dimension,dimension), "black")
printableObject = ImageDraw.Draw(printableImage)
fileobject.seek(0)
pos = 0
char = fileobject.read(1)
while char != '':
char = ord(char)
if char == 0: color = (0,0,0,255)
elif char == 255: color = (255,255,255,255)
elif char > 32 and char < 127: color = (255,0,0,255)
elif char <= 32: color = (0,255,0,255)
else: color = (0,0,255,255)
printableObject.point(hilbertify(dimension,pos),color)
char = fileobject.read(1)
pos += 1
printableImage.show()
'''
This module scans through in 1000 byte chunks and looks for signatures based upon the dictionary in the first line.
It is not 100% inclusive this is one of those, I give you the framework, you don't suck at it, things.
Also yes I know there is a slim chance that the 1000 byte reads will cut a signature in half,
this is how I decided to implement it. I know elsewhere in the script I read over the file like file IO
isn't a bottleneck but since tgis will grow to ~50 signatures.... at least.
I aint reading a file 50 times just cause theres an off chance Ill cut a signature in half.
(not when chances are most will be in the first 1000 anyway) :p
This will generate a FUCK TON of false (or just excessive) positives, which if youre not retarded you will see through ;)
f.ex a jar file I tested contained ~29 Java Bytecode signatures and about 20 images
Obviously this means its a moderately complex program (29 classes at least, with some gui elements)
'''
def signatureScan(file):
file.seek(0)
signatureList = [("DOS mode","PE"),
("ELF","ELF"),("ustar.00","TAR"),
("fLaC","Flac"),("BM","Bitmap"),
("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1","DOC"),
("%PDF","PDF"),("\xFF\xFE\x00\x00","32 Bit UTF encoded Text"),
("\x52\x61\x72\x21\x1A\x07","RAR"),("BZh","Bzip"),
("\x50\x4B\x03\x04\x14\x00\x08\x00\x08\x00","Java ByteCode"),("\xFF\xD8","Jpeg")
]
while True:
data = file.read(1000)
if not data: break
for signature, indicator in signatureList:
if data.find(signature) != -1:
print "Possibly a " + indicator + " file, signature \"" + signature + "\" found"
'''
Dis here runs over the feed and finds the most common byte among other things.
'''
def valueMap(fileobject):
fileobject.seek(0)
values = [0] * 257;
sumof = 0
x = fileobject.read(1)
while x != '':
x = ord(x)
sumof += x
values[x] += 1
x = fileobject.read(1)
vmax = max(values[1:-2])
print "This File contains " + str(values[0]) + " 0x00 bytes out of " + str(size)
print "This File contains " + str(values[255]) + " 0xff bytes out of " + str(size)
print "The Average byte value is : " + str(sumof/size)
print "Most Common Value is : " + hex(values.index(vmax))
valueImage = Image.new('RGBA', (256,256), "black")
drawingObject = ImageDraw.Draw(valueImage)
for x in range(0,255):
for y in range(0,int(values[x]*255/vmax)):
drawingObject.point((x,255-y),(255,0,0))
valueImage.show()
'''
This function is pretty self explanatory
'''
def stringDump(fileobject):
fileobject.seek(0)
temp = fileobject.read(1)
string = ''
while temp != '':
if ord(temp) > 31 and ord(temp) < 128:
string += temp
elif temp == '\n' and len(string) > 3:
print string
string = ''
else:
string = ''
temp = fileobject.read(1)
'''
Yes I chose to make a temp file rather than fuck around with dynamically calculating things, at least for now
'''
def fileTrim(fileobject,startpoint,endpoint):
file2 = open("bindyn.tmp","wb+")
file.seek(int(startpoint))
if (endpoint == 0):
tempchar = '0'
while tempchar != '':
tempchar = file.read(1)
file2.write(tempchar)
else:
tempchar = '0'
while file.tell() != startpoint + endpoint:
tempchar = file.read(1)
file2.write(tempchar)
return file2
'''
And now we're at the beautiful main function situation
Don't suck and the user input parsing won't break on you.
'''
if __name__ == "__main__":
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",help="specify the FILE", metavar="FILE")
parser.add_option("--stats","-s", action="store_true", dest="stats", default=False, help="calculate statistics of byte frequency")
parser.add_option("--digraph","-d", dest="brightness", help="create a digraph of the given file", metavar="BRIGHTNESS")
parser.add_option("--byteplot","-b", action="store_true", dest="byteplot", default=False, help="create a byteplotof the given file")
parser.add_option("--selfsim","-Z", action="store_true", dest="selfsim", default=False, help="create a self similarity plot of the given file, this image is the size of your file squared")
parser.add_option("--strings", action="store_true", dest="strings", default=False, help="output all strings in the file, I suggest defining start and end points")
parser.add_option("--signatures","-F", action="store_true", dest="signatures", default=False, help="scan for file signatures, super rudimentary, meant to be expanded by user.")
parser.add_option("--printable","-p", action="store_true", dest="printable", default=False, help="create a map of the printable characters")
parser.add_option("--entropy","-e" , action="store", dest="minEntropy", help="make a map of the entropy of the file", metavar="ENTROPYLIMIT")
parser.add_option("--sample","-S" , action="store", dest="sampleSize", default = 100, help="change the default sample size", metavar="SAMPLE_SIZE")
parser.add_option("--startpoint" , action="store", dest="startPoint", default = 0, help="set the byte to start at", metavar="STARTP")
parser.add_option("--endpoint" , action="store", dest="endPoint", default = 0, help="set amount of bytes to read", metavar="ENDP")
(options, args) = parser.parse_args()
'''Create or open the file to be worked with'''
file = open(options.filename, 'rb')
if options.startPoint != 0 or options.endPoint != 0:
file = fileTrim(file,options.startPoint,options.endPoint)
file.seek(0,2)
size = file.tell()
dimension = int(math.sqrt(size))
'''The Easy part'''
if options.printable:
printableMap(file)
if options.strings:
stringDump(file)
if options.stats:
valueMap(file)
if options.byteplot:
bytePlot(file)
if options.selfsim:
selfsim(file)
if options.brightness:
digraph(file, int(options.brightness))
if options.minEntropy:
entropyMap(file, float(options.minEntropy), int(options.sampleSize))
if options.signatures:
signatureScan(file)