Author Topic: [Scala] peiscan -- signature scanner  (Read 1572 times)

0 Members and 1 Guest are viewing this topic.

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
[Scala] peiscan -- signature scanner
« on: February 06, 2014, 09:51:40 am »
Hello EZ.

I am currently working on a library for statical analysis of PE files.
One of the tools included is a scanner that can work with PEiD signatures. PEiD is a well known packer and compiler detector for windows and has made its signatures (userdb.txt) public. A lot of modified and improved versions of this file exist on the web.

My scanner is able to read such a signature database and scan for the signatures in PE files (i.e. .exe and .dll files on Windows). The advantage is that you don't need Windows in order to use it. You only need a Java Runtime, which you probably already have.

The scanner works with a prefix tree which makes comparisons much more efficient than just comparing every signature one by one. I had the latter implemented at first and it was sleepy slow.

Usage:
Code: [Select]
java -jar peiscan.jar [-s <signaturefile>] [-ep true|false] <PEfile>
The -s flag allows you to use your own signature database. Otherwise it will use the userdb.txt which I added to the zip file.
-ep option tells the scanner if it shall scan only at the entry point of the PE.
-ep true is the default, is faster, more robust and should be preferred.
-ep false results in searching for signatures in the whole file. This should be your last resort if no result could be found otherwise and it may have false positives.

A typical output looks like this:

Code: [Select]
peiscan v0.1 -- by deque
scanning file ...

[Nullsoft Install System v1.98] bytes matched: 11 at address: 14960
[Nullsoft PiMP Stub v1.x] bytes matched: 95 at address: 14960

The result shows all matching signatures including how many bytes matched so you can decide which match is more relevant for you.

Source:

Last but not least the source.

The scanner is written in Scala. The library however will be targeted for the use in Java applications.

Code: (Scala) [Select]
package com.github.katjahahn.tools

import java.io.File
import java.io.RandomAccessFile
import java.nio.charset.CodingErrorAction
import scala.PartialFunction._
import scala.collection.JavaConverters._
import scala.collection.mutable.{ Map, ListBuffer }
import scala.io.Codec
import com.github.katjahahn.PEData
import com.github.katjahahn.PELoader
import com.github.katjahahn.optheader.StandardFieldEntryKey._
import com.github.katjahahn.sections.SectionLoader
import com.github.katjahahn.sections.SectionTable
import com.github.katjahahn.sections.SectionTableEntryKey
import Signature._
import SignatureScanner._
import com.github.katjahahn.FileFormatException

/**
 * Scans PE files for compiler and packer signatures.
 *
 * @author Deque
 *
 * @constructor Creates a SignatureScanner that uses the signatures applied
 * @param signatures to use for scanning
 */
class SignatureScanner(signatures: List[Signature]) {

  /**
   * @constructor Creates a SignatureScanner that uses the signatures applied
   * @param signatures to use for scanning
   */
  def this(signatures: java.util.List[Signature]) = this(signatures.asScala.toList)

  private val longestSigSequence: Int = signatures.foldLeft(0)(
    (i, s) => if (s.signature.length > i) s.signature.length else i)

  private lazy val epOnlyFalseSigs: SignatureTree =
    createSignatureTree(signatures.filter(_.epOnly == false))

  private lazy val epOnlySigs: SignatureTree =
    createSignatureTree(signatures.filter(_.epOnly == true))

  private def createSignatureTree(list: List[Signature]): SignatureTree = {
    var tree = SignatureTree()
    list.foreach(s => tree += s)
    tree
  }

  /**
   * Scans a file for signatures and returns the best match
   *
   * @param file the PE file to be scanned
   * @return the best match found, null if no match was found
   */
  def scan(file: File, epOnly: Boolean = false): String = {
    val list = scanAll(file, epOnly)
    if (list != Nil) list.last
    else null //for Java
  }

  /**
   * @param file the file to be scanned
   * @return list of scanresults with all matches found
   */
  def _scanAll(file: File, epOnly: Boolean = true): List[ScanResult] = { //use from scala
    var matches = findAllEPMatches(file)
    if (!epOnly) matches = matches ::: findAllEPFalseMatches(file)
    matches
  }

  /**
   * @param file the file to be scanned
   * @return list of strings with all matches found
   */
  def scanAll(file: File, epOnly: Boolean = true): List[String] = { //use from Java
    def bytesMatched(sig: Signature): Int =
      sig.signature.filter(cond(_) { case Some(s) => true }).length
    val matches = _scanAll(file, epOnly)
    for ((m, addr) <- matches)
      yield m.name + " bytes matched: " + bytesMatched(m) + " at address: " + addr
  }

  /**
   * Searches for matches in the whole file using ep_only false signatures.
   *
   * @param file to search for signatures
   */
  def findAllEPFalseMatches(file: File): List[ScanResult] = {
    using(new RandomAccessFile(file, "r")) { raf =>
      val results = ListBuffer[ScanResult]()
      for (addr <- 0L to file.length()) {
        val bytes = Array.fill(longestSigSequence + 1)(0.toByte)
        raf.seek(addr)
        val bytesRead = raf.read(bytes)
        val slicedarr = bytes.slice(0, bytesRead)
        val matches = epOnlyFalseSigs.findMatches(slicedarr.toList)
        results ++= matches.map((_, addr))
      }
      results.toList
    }
  }

  /**
   * Searches for matches only at the entry point and only using signatures that
   * are specified to be checked for at ep_only.
   *
   * @param file to search for signatures
   */
  def findAllEPMatches(file: File): List[ScanResult] = {
    using(new RandomAccessFile(file, "r")) { raf =>
      val data = PELoader.loadPE(file)
      val entryPoint = getEntryPoint(data)
      raf.seek(entryPoint.toLong)
      var bytes = Array.fill(longestSigSequence + 1)(0.toByte)
      val bytesRead = raf.read(bytes)
      val matches = epOnlySigs.findMatches(bytes.slice(0, bytesRead).toList)
      matches.map((_, entryPoint.toLong))
    }
  }

  private def using[A <: { def close(): Unit }, B](param: A)(f: A => B): B =
    try { f(param) } finally { param.close() }

  /**
   * Calculates the entry point with the given PE data
   *
   * @param data the pedata result created by a PELoader
   */
  private def getEntryPoint(data: PEData): Int = {
    val rva = data.getOptionalHeader().getStandardFieldEntry(ADDR_OF_ENTRY_POINT).value
    val section = SectionLoader.getSectionByRVA(data.getSectionTable(), rva)
    val phystovirt = section.get(SectionTableEntryKey.VIRTUAL_ADDRESS) - section.get(SectionTableEntryKey.POINTER_TO_RAW_DATA)
    rva - phystovirt
  }
}

object SignatureScanner {

  /**
   * A file offset/address
   */
  type Address = Long

  /**
   * a scan result is a signature and the address where it was found
   */
  type ScanResult = (Signature, Address)

  private val defaultSigs = new File("userdb.txt")

  private val version = """version: 0.1
    |author: Deque
    |last update: 5.Feb 2014""".stripMargin

  private val title = "peiscan v0.1 -- by deque"

  private val usage = """Usage: java -jar peiscan.jar [-s <signaturefile>] [-ep true|false] <PEfile>
    """.stripMargin

  private type OptionMap = Map[Symbol, String]

  // This name makes more sense to call from Java
  /**
   * Loads default signatures (provided by PEiD) and creates a
   * SignatureScanner that uses these.
   *
   * @return SignatureScanner with default signatures
   */
  def getInstance(): SignatureScanner = apply()

  /**
   * Loads default signatures (provided by PEiD) and creates a
   * SignatureScanner that uses these.
   *
   * @return SignatureScanner with default signatures
   */
  def apply(): SignatureScanner =
    new SignatureScanner(_loadSignatures(defaultSigs))

  /**
   * Loads the signatures from the given file.
   *
   * @param sigFile the file containing the signatures
   * @return a list containing the signatures of the file
   */
  def _loadSignatures(sigFile: File): List[Signature] = {
    implicit val codec = Codec("UTF-8")
    //replace malformed input
    codec.onMalformedInput(CodingErrorAction.REPLACE)
    codec.onUnmappableCharacter(CodingErrorAction.REPLACE)

    var sigs = ListBuffer[Signature]()
    val it = scala.io.Source.fromFile(sigFile)(codec).getLines
    while (it.hasNext) {
      val line = it.next
      if (line.startsWith("[") && it.hasNext) {
        val line2 = it.next
        if (it.hasNext) {
          sigs += Signature(line, it.next, line2)
        }
      }
    }
    sigs.toList
  }

  /**
   * Loads a list of signatures from the specified signature file
   *
   * @param sigFile file that contains the signatures
   * @return list containing the loaded signatures
   */
  def loadSignatures(sigFile: File): java.util.List[Signature] =
    _loadSignatures(sigFile).asJava

  //TODO performance measurement for different chunk sizes
  def main(args: Array[String]): Unit = {
    invokeCLI(args)
  }

  private def invokeCLI(args: Array[String]): Unit = {
    val options = nextOption(Map(), args.toList)
    println(title)
    if (args.length == 0 || !options.contains('inputfile)) {
      println(usage)
    } else {
      var eponly = true
      var signatures = defaultSigs
      var file = new File(options('inputfile))

      if (options.contains('version)) {
        println(version)
      }
      if (options.contains('signatures)) {
        signatures = new File(options('signatures))
      }
      if (options.contains('eponly)) {
        eponly = options('eponly) == "true"
      }
      doScan(eponly, signatures, file)
    }
  }

  private def doScan(eponly: Boolean, sigFile: File, pefile: File): Unit = {
    if (!sigFile.exists()) {
      println(sigFile)
      System.err.println("signature file doesn't exist")
      return
    }
    if (!pefile.exists()) {
      System.err.println("pe file doesn't exist")
      return
    }
    println("scanning file ...")
    if(!eponly) println("(this might take a while)")
    try {
      val scanner = new SignatureScanner(_loadSignatures(sigFile))
      val list = scanner.scanAll(pefile, eponly)
      if (list.length == 0) println("no signature found")
      else list.foreach(println)
    } catch {
      case e: FileFormatException => System.err.println(e.getMessage())
    }
  }

  private def nextOption(map: OptionMap, list: List[String]): OptionMap = {
    list match {
      case Nil => map
      case "-s" :: value :: tail =>
        nextOption(map += ('signatures -> value), tail)
      case "-ep" :: value :: tail =>
        nextOption(map += ('eponly -> value), tail)
      case "-v" :: tail =>
        nextOption(map += ('version -> ""), tail)
      case value :: Nil => nextOption(map += ('inputfile -> value), list.tail)
      case option :: tail =>
        println("Unknown option " + option + "\n" + usage)
        sys.exit(1)
    }
  }

}


This is the prefix tree used to search for signature matches:

Code: (Scala) [Select]
package com.github.katjahahn.tools

import scala.collection.mutable.MutableList
import PartialFunction._

/**
 * A mutable prefix tree for byte signatures. Provides a fast way to match a byte
 * sequence to a large number of signatures
 *
 * @author Deque
 *
 */
abstract class SignatureTree {

  /**
   * Inserts the signature to the tree. Note that the SignatureTree is mutable
   *
   * @param sig signature to be inserted
   * @return tree with new signature
   */
  def +(sig: Signature): SignatureTree = {
    insert(sig, sig.signature.toList)
    this
  }

  /**
   * @param sig the signature to be inserted
   * @param bytes the byte sequence that has to be inserted to the rest of the
   *        tree
   */
  private def insert(sig: Signature, bytes: List[Option[Byte]]): Unit = {
    bytes match {
      case b :: bs => this match {
        case Node(c, v) =>
          val op = c.find(_.hasValue(b))
          val node = op.getOrElse { val n = Node(MutableList(), b); c += n; n }
          node.insert(sig, bs)
        case _ => throw new IllegalStateException("wrong tree component")
      }

      case Nil =>
        this match {
          case Node(c, v) => c += Leaf(sig)
          case _ => throw new IllegalStateException("wrong tree component")
        }
    }
  }

  /**
   * Collects all signatures that match the given byte sequence.
   *
   * @param bytes the byte sequence to compare with the signatures
   * @return list of signatures that matches the bytes
   */
  def findMatches(bytes: List[Byte]): List[Signature] = {
    bytes match {

      case b :: bs => this match {
        case Node(c, v) =>
          val children = c.filter(_.matchesValue(b))
          children.foldRight(List[Signature]())(
            (ch, l) => ch.findMatches(bs) ::: l) ::: collectSignatures(c)
        case Leaf(s) => List(s)
      }

      case Nil => this match {
      case Node(c, v) => collectSignatures(c)
        case Leaf(s) => List(s)
      }
    }
  }

  /**
   * Collects the signatures of all leaves in the given list.
   * Actually there should only be one signature in one childrenlist, otherwise
   * you have two signatures with the same byte sequence.
   *
   * @param list a list that contains nodes and leaves
   * @return all signatures found in the leaves of the list
   */
  private def collectSignatures(list: MutableList[SignatureTree]): List[Signature] = {
    list.collect({ case Leaf(s) => s }).toList
  }

  /**
   * Returns whether the current SignatureTree Node (or Leave) has a value that equals b.
   *  A Leave always returns false as it has no value saved.
   *
   * @param b an Option byte
   * @return true iff it is a Node and has a value that equals b
   */
  protected def hasValue(b: Option[Byte]): Boolean = false

  /**
   * Returns whether the current SignatureTree Node (or Leave) has a value that
   * matches b (Note: None matches to every Byte).
   * A Leave always returns false as it has no value saved.
   *
   * @param b a Byte
   * @return true if the given byte matches the value in the node; that means
   * if the value in the node is a None, it returns true; if the current node is
   * a Leave it returns false
   */
  protected def matchesValue(b: Byte): Boolean = false

}

private case class Node(children: MutableList[SignatureTree], value: Option[Byte]) extends SignatureTree {

  override protected def hasValue(b: Option[Byte]): Boolean = value == b

  override protected def matchesValue(b: Byte): Boolean = value match {
    case None => true
    case Some(v) => v == b
  }
  override def toString(): String = value + "[" + children.mkString(",") + "]"
}

private case class Leaf(signature: Signature) extends SignatureTree {
  override def toString(): String = signature.name
}

object SignatureTree {

  /**
   * Creates an empty SignatureTree
   */
  def apply(): SignatureTree = Node(MutableList[SignatureTree](), null)

  def main(args: Array[String]): Unit = {
    val tree = SignatureTree()
    val bytes = List(Some(1.toByte), None, Some(3.toByte), Some(4.toByte))
    val bytes2 = List(1, 2, 3).map(x => Some(x.toByte))
    val bytes3 = List(6, 7, 8).map(x => Some(x.toByte))
    val sig = new Signature("first", false, bytes.toArray)
    val sig2 = new Signature("second", false, bytes2.toArray)
    val sig3 = new Signature("third", true, bytes3.toArray)
    tree + sig
    tree + sig2
    tree + sig3
    println()
    println(tree)
    println(tree.findMatches(List(1, 2, 3).map(_.toByte)))
  }

}

Download: attached
« Last Edit: May 30, 2014, 08:36:03 am by Deque »