Author Topic: [Java Source] Cryptanalysis - Frequency Analysis and Index of Coincidence  (Read 10404 times)

0 Members and 1 Guest are viewing this topic.

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Given an encrypted message without any knowledge about the used cipher you nevertheless might be able to crack it (given that the cipher used has some vulnerabilities).
This post provides codesnippets for analysing encrypted messages for letter frequencies and coincidences.

Index of Coincidence (IC)

What you can find out by the IC, is whether a message is encoded by a monoalphabetic (like Caesar cipher) or a polyalphabetic cipher (like Vigenère).

The index of coincidence is at 1.73 for plaintext in English. If you get an index close to 1 (which would be the value for a randomly created text) you have probably a polyalphabetic cipher and need to do further examinations like Friedman Test. If the index is higher, you have a monoalphabetic one and are able to crack it just by counting the letters (frequency analysis).

A sample output looks like this (the longer the message the better the result):

Quote
IC: 2.0548892842111965

kappa text: 0.07610701052634061
kappa random: 0.037037037037037035

If kappa text is close to kappa random, you probably have a polyalphabethic cipher, otherwise a monoalphabetic one

You see, IC is pretty high, so you have for sure a plaintext or a monoalphabetic cipher. It is the same with kappa text and kappa random, just that they are not normalized. The IC is the same as (kappa text * letters). kappa text for English and with 26 letters is about 0.067. Of course you get other values for kappa if you use more characters than just the letters of the alphabet whereas the IC will stay the same.

Code: (Java) [Select]
import java.util.Map;

public class IndexOfCoincidence {

    private Language language;
    private FrequencyAnalysis ana;
    private final String newline = System.getProperty("line.separator");

    public String analyse(String code) {
        ana = new FrequencyAnalysis();
        ana.analyse(code);

        double kappaText = computeKappaText();
        double kappaRand = computeKappaRandom();
        double ic = kappaText * amountOfCharacters();
        return "IC: " + ic + newline + newline
                + "kappa text: " + kappaText + newline
                + "kappa random: " + kappaRand + newline + newline
                + "If kappa text is close to kappa random, you probably have a "
                + "polyalphabethic cipher, otherwise a monoalphabetic one";
    }

    public double computeKappaRandom() {
        return 1 / (double) ana.getFrequency().size();
    }
   
    public int amountOfCharacters(){
        return ana.getFrequency().size();
    }

    public double computeKappaText() {
        Map<Character, Integer> frequency = ana.getFrequency();

        long n = ana.getSumCount();
        int sumNi = 0;
        for (Integer i : frequency.values()) {
            sumNi += (i * (i - 1));
        }
        return (sumNi / (double) (n * (n - 1)));
    }

}

Frequency Analysis

As you can see above frequency analysis is used to determine the IC. But that is not the only purpose. Every language has typical letter frequencies, e.g. the letter 'e' is the most common letter in the english language, so you assume that the most counted letter in an encrypted message will be 'e'. But this works only with monoalphabethic ciphers. It wouldn't work with Vigenère, because this cipher uses more alphabeths than one, which means the letter 'e' is not always translated with the same character.
But given a shift cipher (Caesar) it is pretty easy to crack with frequency analysis. You just need to determine the most common character in there. If this is 'h' you know the cipher has shifted every letter of the alphabeth three times.

Code: (Java) [Select]
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

public class FrequencyAnalysis {

    private Map<Character, Integer> frequency;
    private Language language;

    public String analyse(String code) {
        frequency = new HashMap<Character, Integer>();
        count(code);
        return getAnalysisString();
    }
   
    public Map<Character, Integer> getFrequency(){
        return new HashMap<Character, Integer>(frequency);
    }

    private String getAnalysisString() {
        StringBuilder b = new StringBuilder();
        long sumcount = getSumCount();
        b.append("Char\tAbs\tRel\n");
        for (Character c : getSortedKeyList()) {
            b.append(c + "\t" + frequency.get(c) + "\t"
                    + (frequency.get(c) / (double) sumcount) + "\n");
        }
        b.append("\nletters: " + sumcount);
        b.append("\ncharacters: " + frequency.size());
        return b.toString();
    }

        //insertion sort is used here
    private List<Character> getSortedKeyList() {
        List<Character> list = new LinkedList<Character>();
        for(Entry<Character, Integer> entry : frequency.entrySet()){
            for(int i = 1; i < list.size(); i++){
                if(frequency.get(list.get(i)) > entry.getValue()){
                    list.add(i, entry.getKey());
                    break;
                }
            }
            if(!list.contains(entry.getKey())){
                list.add(entry.getKey());
            }
        }
        return list;
    }

    public long getSumCount() {
        long sum = 0;
        for (Integer i : frequency.values()) {
            sum += i;
        }
        return sum;
    }

    private void count(String code) {
        char[] text = code.toCharArray();
        for (char c : text) {
            if (frequency.containsKey(c)) {
                frequency.put(c, frequency.get(c) + 1);
            } else {
                frequency.put(c, 1);
            }
        }
    }

}

A sample output for a german plaintext looks like this:

Code: [Select]
Char    Abs    Rel

    2    0.0018450184501845018
2    1    9.225092250922509E-4
5    1    9.225092250922509E-4
;    1    9.225092250922509E-4
N    1    9.225092250922509E-4
H    1    9.225092250922509E-4
I    1    9.225092250922509E-4
K    1    9.225092250922509E-4
U    1    9.225092250922509E-4
—    1    9.225092250922509E-4
„    1    9.225092250922509E-4
Z    1    9.225092250922509E-4
“    1    9.225092250922509E-4
€    1    9.225092250922509E-4
y    1    9.225092250922509E-4
x    1    9.225092250922509E-4
(    2    0.0018450184501845018
)    2    0.0018450184501845018
F    2    0.0018450184501845018
B    2    0.0018450184501845018
M    2    0.0018450184501845018
V    2    0.0018450184501845018
j    2    0.0018450184501845018
G    3    0.0027675276752767526
A    3    0.0027675276752767526
W    3    0.0027675276752767526
ß    3    0.0027675276752767526
P    3    0.0027675276752767526
ö    3    0.0027675276752767526
E    4    0.0036900369003690036
L    4    0.0036900369003690036
T    4    0.0036900369003690036
S    4    0.0036900369003690036
ü    4    0.0036900369003690036
p    5    0.004612546125461255
D    6    0.005535055350553505
k    6    0.005535055350553505
,    7    0.006457564575645757
ä    9    0.008302583025830259
.    10    0.00922509225092251
v    11    0.01014760147601476
b    13    0.011992619926199263
w    14    0.012915129151291513
f    15    0.013837638376383764
m    16    0.014760147601476014
o    17    0.015682656826568265
z    18    0.016605166051660517
g    23    0.021217712177121772
c    26    0.023985239852398525
l    28    0.025830258302583026
u    36    0.033210332103321034
s    39    0.035977859778597784
d    40    0.03690036900369004
h    41    0.03782287822878229
a    48    0.04428044280442804
t    51    0.0470479704797048
i    65    0.05996309963099631
n    85    0.07841328413284133
r    87    0.08025830258302583
e    143    0.13191881918819187
     156    0.14391143911439114

letters: 1084
characters: 61

In case you want to try this code, just add this method:

Code: (Java) [Select]
   
public static void main(String[] args) {
    Scanner scan = new Scanner(System.in);
    System.out.println("Your encoded message:\n");
    String code = scan.nextLine();
    IndexOfCoincidence ic = new IndexOfCoincidence();
    System.out.println(ic.analyse(code));
}

Deque
« Last Edit: February 12, 2013, 02:37:07 pm by Deque »

Offline parad0x

  • VIP
  • Royal Highness
  • *
  • Posts: 638
  • Cookies: 118
    • View Profile
Re: [Java Source] Cryptanalysis - frequency and coincidence counting
« Reply #1 on: February 02, 2013, 05:00:15 pm »
I started studying cryptology and was thinking to do it manually or code it after some time, but you know more than me and coded this thing. Thanks for the software, I'll use this in cryptanalysis. rep + for you.  ;)

Offline Teddy

  • /dev/null
  • *
  • Posts: 12
  • Cookies: 8
    • View Profile
Re: [Java Source] Cryptanalysis - frequency and coincidence counting
« Reply #2 on: February 04, 2013, 12:00:13 pm »
Good work! Thanks for sharing =)