Author Topic: [Paper] Reversing and Forensic importance of Chrome's Visited Links file  (Read 908 times)

0 Members and 1 Guest are viewing this topic.

Offline Psycho_Coder

  • Knight
  • **
  • Posts: 166
  • Cookies: 84
  • Programmer, Forensic Analyst
    • View Profile
    • Code Hackers Blog
Hello EZ,

Today I will be sharing how to create a parser for "Visited Links" file created and used by Google Chrome browser, I will also further discuss how it is forensically important. Another learning outcome is aimed at people understanding how to reverse or recreate a feature from available code to suit your own needs. Didn't get what I meant? Read ahead.

Prerequisites/Reader's Note

1. Familiarity with any high level programming language (like Java).
2. You know about number system.
3. Know a little C++ or have the ability to understand programming procedures (even if you don't know the syntax of that language) and SQL.
4. This topic is not for beginners, I assume a certain level of technical maturity.

Let's begin.


Introduction

All modern web browsers like Google Chrome, Mozilla Firefox or Safari etc. store user data in different files and in some particular location to store user details or browser specific settings, bookmarks, passwords and more, mostly to enhance the user experience. Depending on the type of operating system and the version of the browser these location change but most of the files and their format remain unchanged. But its very much uncertain that they will be same in future versions as well. For example, Firefox earlier used to save passwords in a file named signons.txt but later they switched to an sqlite database to store the passwords saved by user, for more details click here.. Similarly the locations where these browsers store the data varies too.

Q. So what's with this tutorial anyway? What should I expect to find if I read ahead?

Ans. I will discuss a particular forensically important artifact for Chrome Forensics, namely "Visited Links" file. This topic is not about learning browser forensics which is beyond the scope of this tutorial. That subject I will discuss later in greater detail. I will show how using open source code we can reverse engineer a way to interpret things for which no official format is defined.


About "Visited Links" file

Chrome stores all its data in certain folders which are OS specific like:

In Windows

Code: [Select]
C:\Users\%username%\AppData\Local\Google\Chrome\User Data\Default
In Linux

In case of Linux we can have two options like if the user uses Chromium (Open Source version of Google Chrome) then the storage location is:

Code: [Select]
/home/$USER/.config/chromium/Default
But again if the user uses Google Chrome then the storage location would be

Code: [Select]
/home/$USER/.config/google-chrome/Default/
Again for Mac, its different.

Inside this Default folder we find several files (quite a lot of them are SQLite databases and some are in JSON) which store different type of data and many of them are forensically important too but I will discuss about one that is "Visited Links". This file is neither an SQLite database or JSON or any known file format which has an official definition. We can find certain details about its format here: http://www.forensicswiki.org/wiki/Google_Chrome#Visited_Link, but is that all?

Google Chrome/Chromium stores the links that a user has visited in this "Visited Links" file and the next time you visit the same webpage you will see them that they are rendered in purple. But are they stored here only? No, there exists discrepancies between the different other sections where urls or links are stored. Browsing history related details are stored in several tables in a SQLite database named History. Most interesting tables would be urls and visits. When you open this database file with an sqlite data browser then browse the urls table data you will see all the urls that you have visited till date, it also stores additional details.



In conjunction with another table visits we can also retrieve the first visited time. The details that we get from these tables are very important and hold greater priority. Following shows the visits table (if you observe well you will understand why the filter panel contains 151, I left it as an exercise for the reader :D ):



Please make sure that you note the column names well since we will be using them later.

Now let's get to the real point, all that I discussed above was a little ground work necessary for what's more to come. When I started this topic I said I would discuss about the Visited Links file, and we will get on it now. The links stored in the urls table are stored in this file as well but in a different format which is neither a database nor any JSON file but it is a Hash Table by design. Urls stored in this file are unrecoverable since their md5 digest is stored. To avoid collisions it uses a salt to create digest for the url. But yeah if only we can reverse engineer the way the url was hash then we can get the list of urls stored in this file.

Obvious Questions that might arise?

Q. Why store the information in a hashtable when the digest can also be stored in JSON, or SQLite database?

Ans. Performance. Hash table performs search in O(1) in the average case. The hash table search complexity is O(1) + O(load_factor) = O(1 + load_factor). Now, load_factor = n, in the worst case. So, the search complexity is O(n) in the worst case.

More info: https://en.wikipedia.org/wiki/Hash_table

Q. Any specific file identifiers for Visited Links file?

Ans/ Yes. The file signature is "VLnk". Size: 4 bytes, Starting offset: 0x00.

Q. How to recover the salt you mentioned used to hash a url?

Ans. The salt is recoverable, it's size is 8 bytes and the starting offset being 0x10

Q. Any other details given aside magic and salt?

Ans. Yes. But I will discuss them in the next section.


Understanding the "Visited Links" file format

Now finally its time to dig into the file format. So how do we understand it. Most of Chrome's source is made open source to another project named Chromium. We will dig into its code to understand the format. We will look into the following two locations:

1. https://code.google.com/p/chromium/codesearch#chromium/src/components/visitedlink/browser/&cl=GROK
2. https://code.google.com/p/chromium/codesearch#chromium/src/components/visitedlink/common/&cl=GROK

Not all the files but some. Let's first examine the visited links file under a hex editor. We can see something like:



Note that I have also mentioned what each group of bytes mean, we can get the info from this file named visitedlinks_master.cc



In the above image, we can see a lot of details which I have mapped in the hex image, we also see the version and the default hash table size or length which has been hard coded and set to 16381. We also see the salt offset is set to 16 which I have mentioned before already. Let's locate it:



In the second last image we can see that the salt size has been used "LINK_SALT_LENGTH" and the size is defined here.

Hexinator fans? Anybody? If not, then try it. It's fun :D

Well I created a simple grammer just for the header (Range: 0x00 - 0x18). Save the following code in a file named "visitedlinks.grammer".

Code: (xml) [Select]
<?xml version="1.0" encoding="UTF-8"?>
<ufwb version="1.13">
    <grammar name="Visited Links Grammer" start="id:5" author="Psycho_Coder" email="psychocoder@outlook.com">
        <description>Grammer to parse Visited Links file of Google Chrome</description>
        <structure name="VLnk File" id="5" encoding="ISO_8859-1:1987" endian="big" signed="no">
            <structure name="Header" id="6" length="24">
                <string name="Magic Header" id="7" fillcolor="55AAFF" type="fixed-length" length="4" encoding="Adobe-Standard-Encoding"/>
                <number name="Version" id="8" fillcolor="FFAAFF" type="integer" length="1"/>
                <binary name="&lt;Binary Fill Bytes&gt;" id="9" unused="yes" length="3"/>
                <number name="Length" id="10" fillcolor="AAAAFF" type="integer" length="2"/>
                <binary name="&lt;Binary Fill Bytes-1&gt;" id="11" unused="yes" length="2"/>
                <binary name="Used" id="12" fillcolor="A2FF27" length="4"/>
                <binary name="Salt" id="13" fillcolor="FF973C" length="8"/>
            </structure>
        </structure>
    </grammar>
</ufwb>

If you have installed hexinator then open the grammer file and then open "Visited Links" file, it will prompt you to apply the grammer to the newly opened file. Apply it and you see something like below:



So, that's all about the basics. Now we move to the point where we understand the url digest mechanism.

URL digest mechanism

As mentioned before the url is hashed with md5 along with the salt. But only the first 8 bytes of hash is stored in this file. The general syntax for calculating the URL fingerprint  is: MD5( salt + url ) then take the first 8 bytes. If we look at Chromium's source then we get this info from here



One last thing I should mention before proceeding to the next topic is that this file does not store any url data when a user browses the web in In-Porn Mode oops I made a mistake, its Incognito mode ( 8) ). See here



So now that we have understood about the internals of Visited Links file. It's finally time to write a little parser for it. People who are interested can dig into the chromium project and find some cool stuff and if you get it share it with us (Specially if you find something interesting with sessions). I wrote a little parser in java and the code is pretty much readable (at least I hope so).


Important variable declarations:

Code: (java) [Select]
private final HashSet<String> fingerprints;
private final byte[] VLNK_MAGIC_HEADER = "VLnk".getBytes();
private byte[] salt;

private final int HEADER_SALT_OFFSET = 0x10;
private final int HEADER_SALT_LENGTH = 8;
private final int URL_FINGERPRINT_LENGTH = 8;

Method that parses the file, verifies the file signature, retrieves the salt used to hash the urls and stores all the fingerprints in a Set. Note: Refer this to get the code for verifying file signatures

Code: (java) [Select]
    /**
     * Parses the
     * <pre>Visited Links</pre> file and stores the fingerprints in a HashSet
     *
     * @return Returns 1 if parsing was successful else returns -1
     */
    public int parse() {
        if (Utils.verifyFileHeader(vLnkFile, VLNK_MAGIC_HEADER)) {
            salt = new byte[HEADER_SALT_LENGTH];
            byte[] bytes = new byte[URL_FINGERPRINT_LENGTH];
            try (RandomAccessFile raf = new RandomAccessFile(vLnkFile, "r")) {
                int val;
                raf.seek(HEADER_SALT_OFFSET);
                raf.read(salt);
                while ((val = raf.read()) != -1) {
                    if (val != 0) {
                        raf.seek(raf.getFilePointer() - 1);
                        raf.read(bytes, 0, URL_FINGERPRINT_LENGTH);
                        fingerprints.add(Utils.byteArrayToHex(bytes));
                    }
                }
            } catch (FileNotFoundException ex) {
                System.err.println(ex.getMessage());
            } catch (IOException ex) {
                System.err.println(ex.getMessage());
            }
        } else {
            return -1;
        }
        return 1;
    }

Now to get a custom URL fingerprint refer the following:

Code: (java) [Select]
/**
     * Calculates the URL fingerprint for any custom url.
     *
     * @param salt Salt used during hashing.
     * @param data array of bytes for the URL whose fingerprint will be
     * calculated.
     * @return Returns the first 8 bytes of the digest after hex encoding them.
     */
    public String getUrlFingerprint(byte[] salt, byte[] data) {
        byte[] mdBytes = null;

        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            md.update(salt);
            md.update(data);
            mdBytes = md.digest();
        } catch (NoSuchAlgorithmException ex) {
            System.err.println("Couldn't determine the hashing algorithm." + ex.
                    getMessage());
        }

        return Utils.byteArrayToHex(Arrays.copyOf(mdBytes, URL_FINGERPRINT_LENGTH));
    }

Finally to see is a URL was visited and it is present in this file or not, we can directly call the following method:

Code: (java) [Select]
public boolean isVisited(String url) {
        return fingerprints.contains(getUrlFingerprint(salt, url.getBytes()));
    }

You can have a look at the complete class here: https://github.com/AnimeshShaw/ChromeForensics/blob/master/src/main/java/net/letshackit/chromeforensics/core/artefacts/VisitedLinkParser.java

I put up everything above in a single file with a main method and it has a practical example as well. You will need sqlite-jdbc library to execute the code. You can get it from here

Code: (java) [Select]

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Arrays;
import java.util.HashSet;

/**
 * Class that parses the
 * <pre>Visited Links</pre> file and gathers the information contained
 *
 * @author Psycho_Coder
 */
public class VlinkParser {

    private final File vLnkFile;

    private final HashSet<String> fingerprints;
    private final byte[] VLNK_MAGIC_HEADER = "VLnk".getBytes();
    private byte[] salt;

    private final int HEADER_SALT_OFFSET = 0x10;
    private final int HEADER_SALT_LENGTH = 8;
    private final int URL_FINGERPRINT_LENGTH = 8;

    public VlinkParser(File vLnkFile) {
        this.vLnkFile = vLnkFile;
        fingerprints = new HashSet<>();
    }

    public boolean verifyFileHeader(File file, byte[] magicNumber) {
        try (FileInputStream fis = new FileInputStream(file)) {
            byte[] buffer = new byte[magicNumber.length];
            if (fis.read(buffer) != -1) {
                return Arrays.equals(magicNumber, buffer);
            }
        } catch (IOException ex) {
            System.err.println(ex.getMessage());
        }
        return false;
    }

    public void parse() {
        if (verifyFileHeader(vLnkFile, VLNK_MAGIC_HEADER)) {
            salt = new byte[HEADER_SALT_LENGTH];
            byte[] bytes = new byte[URL_FINGERPRINT_LENGTH];
            try (RandomAccessFile raf = new RandomAccessFile(vLnkFile, "r")) {
                int val;
                raf.seek(HEADER_SALT_OFFSET);
                raf.read(salt);
                //int currOff = HEADER_FILE_SIZE + 1;
                while ((val = raf.read()) != -1) {
                    if (val != 0) {
                        raf.seek(raf.getFilePointer() - 1);
                        raf.read(bytes, 0, URL_FINGERPRINT_LENGTH);
                        fingerprints.add(byteArrayToHex(bytes));
                    }
                }
            } catch (IOException ex) {
                System.err.println(ex.getMessage());
            }
        }
    }

    /**
     * Converts array of bytes to hex string.
     *
     * @param bytes Byte Array to be converted to Hex String.
     * @return Returns the hex string for {@code bytes} array.
     */
    public static String byteArrayToHex(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < bytes.length; i++) {
            sb.append(Integer.toString((bytes[i] & 0xff) + 0x100, 16).
                    substring(1));
        }
        return sb.toString();
    }

    public static String getUrlFingerprint(byte[] salt, byte[] data) {
        byte[] mdBytes = null;
        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            md.update(salt);
            md.update(data);
            mdBytes = md.digest();
        } catch (NoSuchAlgorithmException ex) {
            System.err.println("Couldn't determine the hashing algorithm." + ex.
                    getMessage());
        }
        return byteArrayToHex(Arrays.copyOf(mdBytes, 8));
    }

    public boolean isVisited(String url) {
        return fingerprints.contains(getUrlFingerprint(salt, url.getBytes()));

    }

    private byte[] getSalt() {
        return salt;
    }

    public static void main(String[] args) throws SQLException {
        Connection c = null;
        Statement stmt = null;
        ResultSet rs = null;
        try {
            VlinkParser parser = new VlinkParser(new File("C:\\Users\\Psycho\\AppData\\Local\\Google\\Chrome\\User Data\\"
                    + "Default\\Visited Links"));
            parser.parse();

            String hisPath = "C:\\Users\\Psycho\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\History";
            Class.forName("org.sqlite.JDBC");
            c = DriverManager.getConnection("jdbc:sqlite:" + hisPath);
            stmt = c.createStatement();
            rs = stmt.executeQuery("SELECT urls.url, datetime(((visits.visit_time/1000000)-11644473600), \"unixepoch\")"
                    + "FROM urls, visits WHERE urls.id = visits.url;");

            Path savePath = Paths.get(System.getProperty("user.dir"), "Url-Visits.tsv");
           
            try (BufferedWriter writer = Files.newBufferedWriter(savePath, StandardCharsets.UTF_8)) {
                StringBuilder sb;
                writer.append("URL\tVISITED TIME\tHAS VISITED?\n");
                while (rs.next()) {
                    sb = new StringBuilder();
                    sb.append(rs.getString(1));
                    sb.append("\t");
                    sb.append(rs.getString(2));
                    sb.append("\t");
                    sb.append(parser.isVisited(rs.getString(1)));
                    writer.append(sb.toString());
                    writer.newLine();
                }
            } catch (IOException ex) {
                System.err.println(ex.getMessage());
            }
        } catch (ClassNotFoundException | SQLException ex) {
            System.err.println(ex.getMessage());
        } finally {
            if (rs != null) {
                rs.close();
            }
            if (stmt != null) {
                stmt.close();
            }
            if (c != null) {
                c.close();
            }
        }

    }
}

So what does the above code do?

It opens the SQlite database file History (I have explained about it earlier) and reads two tables namely, urls and visits and fetches the visit time of the url. There after it checks whether there are any records of it in the Visited Links file. Finally it writes the details in a TSV file.


There are two important things that I haven't mentioned clearly but I have given hints for it and left it as an exercise for the reader. Investigate and let me know. So with this I finally conclude my paper. If you were able to reach upto this point while reading sincerely then I really appreciate it and I hope you like it. For any clarifications please reply below.


Thanking you,
Sincerely,
Psycho_Coder
« Last Edit: February 19, 2016, 06:43:28 pm by Psycho_Coder »
"Don't do anything by half. If you love someone, love them with all your soul. When you hate someone, hate them until it hurts."--- Henry Rollins

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: [Paper] Reversing and Forensic importance of Chrome's Visited Links file
« Reply #1 on: February 27, 2016, 01:34:19 pm »
I finally took the time to read through your paper. Thank you for your writeup!

Quote
this file does not store any url data when a user browses the web in In-Porn Mode oops I made a mistake, its Incognito mode ( 8) )

Now I know what you do in your free time.  8)

Btw, do you have more grammars for Hexinator that are worth a share?

Offline Psycho_Coder

  • Knight
  • **
  • Posts: 166
  • Cookies: 84
  • Programmer, Forensic Analyst
    • View Profile
    • Code Hackers Blog
Re: [Paper] Reversing and Forensic importance of Chrome's Visited Links file
« Reply #2 on: February 27, 2016, 02:44:09 pm »
I finally took the time to read through your paper. Thank you for your writeup!

Now I know what you do in your free time.  8)

Btw, do you have more grammars for Hexinator that are worth a share?

Thanks for finally reading it. I appreciate it. I wish I had more free time :D hahaha.

Synalize maintains a list of all Hexinator Grammers that people have created. You can find them here: https://www.synalysis.net/formats.xml

There are a few file formats, which are related to browser forensics on which there's no (or scarce) info is available on the web. Currently, I am trying to reverse engineer them. When its done I will release the grammers for those files. For the rest I don't have, I am sorry but if I come across something which might be of use in your line of work/interest I will surely share it with you.

Thanks again for your time, its always inspiring.
"Don't do anything by half. If you love someone, love them with all your soul. When you hate someone, hate them until it hurts."--- Henry Rollins