Hello EZ,
Today I will be sharing how to create a parser for "Visited Links" file created and used by Google Chrome browser, I will also further discuss how it is forensically important.
Another learning outcome is aimed at people understanding how to reverse or recreate a feature from available code to suit your own needs. Didn't get what I meant? Read ahead.Prerequisites/Reader's Note1. Familiarity with any high level programming language (like Java).
2. You know about number system.
3. Know a little C++ or have the ability to understand programming procedures (even if you don't know the syntax of that language) and SQL.
4. This topic is not for beginners, I assume a certain level of technical maturity.
Let's begin.
IntroductionAll modern web browsers like Google Chrome, Mozilla Firefox or Safari etc. store user data in different files and in some particular location to store user details or browser specific settings, bookmarks, passwords and more, mostly to enhance the user experience. Depending on the type of operating system and the version of the browser these location change but most of the files and their format remain unchanged. But its very much uncertain that they will be same in future versions as well. For example, Firefox earlier used to save passwords in a file named
signons.txt but later they switched to an sqlite database to store the passwords saved by user, for more details
click here.. Similarly the locations where these browsers store the data varies too.
Q. So what's with this tutorial anyway? What should I expect to find if I read ahead?Ans. I will discuss a particular forensically important artifact for Chrome Forensics, namely "
Visited Links" file.
This topic is not about learning browser forensics which is beyond the scope of this tutorial. That subject I will discuss later in greater detail. I will show how using open source code we can reverse engineer a way to interpret things for which no official format is defined.
About "Visited Links" fileChrome stores all its data in certain folders which are OS specific like:
In WindowsC:\Users\%username%\AppData\Local\Google\Chrome\User Data\Default
In LinuxIn case of Linux we can have two options like if the user uses Chromium (Open Source version of Google Chrome) then the storage location is:
/home/$USER/.config/chromium/Default
But again if the user uses Google Chrome then the storage location would be
/home/$USER/.config/google-chrome/Default/
Again for Mac, its different.
Inside this Default folder we find several files (quite a lot of them are SQLite databases and some are in JSON) which store different type of data and many of them are forensically important too but I will discuss about one that is "Visited Links". This file is neither an SQLite database or JSON or any known file format which has an official definition. We can find certain details about its format here:
http://www.forensicswiki.org/wiki/Google_Chrome#Visited_Link, but is that all?
Google Chrome/Chromium stores the links that a user has visited in this
"Visited Links" file and the next time you visit the same webpage you will see them that they are rendered in purple. But are they stored here only? No, there exists discrepancies between the different other sections where urls or links are stored. Browsing history related details are stored in several tables in a SQLite database named
History. Most interesting tables would be
urls and
visits. When you open this database file with an sqlite data browser then browse the urls table data you will see all the urls that you have visited till date, it also stores additional details.
In conjunction with another table
visits we can also retrieve the
first visited time. The details that we get from these tables are very important and hold greater priority. Following shows the visits table (if you observe well you will understand why the filter panel contains 151, I left it as an exercise for the reader
):
Please make sure that you note the column names well since we will be using them later.
Now let's get to the real point, all that I discussed above was a little ground work necessary for what's more to come. When I started this topic I said I would discuss about the Visited Links file, and we will get on it now. The links stored in the
urls table are stored in this file as well but in a different format which is neither a database nor any JSON file but it is a
Hash Table by design. Urls stored in this file are unrecoverable since their md5 digest is stored. To avoid collisions it uses a salt to create digest for the url. But yeah if only we can reverse engineer the way the url was hash then we can get the list of urls stored in this file.
Obvious Questions that might arise?Q. Why store the information in a hashtable when the digest can also be stored in JSON, or SQLite database?Ans. Performance. Hash table performs search in O(1) in the average case. The hash table search complexity is
O(1) + O(load_factor) = O(1 + load_factor). Now, load_factor = n, in the worst case. So, the search complexity is O(n) in the worst case.
More info: https://en.wikipedia.org/wiki/Hash_tableQ. Any specific file identifiers for Visited Links file?Ans/ Yes. The file signature is "VLnk".
Size: 4 bytes,
Starting offset: 0x00.
Q. How to recover the salt you mentioned used to hash a url? Ans. The salt is recoverable, it's size is 8 bytes and the starting offset being
0x10Q. Any other details given aside magic and salt? Ans. Yes. But I will discuss them in the next section.
Understanding the "Visited Links" file formatNow finally its time to dig into the file format. So how do we understand it. Most of Chrome's source is made open source to another project named Chromium. We will dig into its code to understand the format. We will look into the following two locations:
1.
https://code.google.com/p/chromium/codesearch#chromium/src/components/visitedlink/browser/&cl=GROK2.
https://code.google.com/p/chromium/codesearch#chromium/src/components/visitedlink/common/&cl=GROKNot all the files but some. Let's first examine the visited links file under a hex editor. We can see something like:
Note that I have also mentioned what each group of bytes mean, we can get the info from this file named
visitedlinks_master.ccIn the above image, we can see a lot of details which I have mapped in the hex image, we also see the version and the default hash table size or length which has been hard coded and set to 16381. We also see the salt offset is set to 16 which I have mentioned before already. Let's locate it:
In the second last image we can see that the salt size has been used "LINK_SALT_LENGTH" and the size is defined
here.
Hexinator fans? Anybody? If not, then try it. It's fun
Well I created a simple grammer just for the header (Range: 0x00 - 0x18). Save the following code in a file named "visitedlinks.grammer".
<?xml version="1.0" encoding="UTF-8"?>
<ufwb version="1.13">
<grammar name="Visited Links Grammer" start="id:5" author="Psycho_Coder" email="psychocoder@outlook.com">
<description>Grammer to parse Visited Links file of Google Chrome</description>
<structure name="VLnk File" id="5" encoding="ISO_8859-1:1987" endian="big" signed="no">
<structure name="Header" id="6" length="24">
<string name="Magic Header" id="7" fillcolor="55AAFF" type="fixed-length" length="4" encoding="Adobe-Standard-Encoding"/>
<number name="Version" id="8" fillcolor="FFAAFF" type="integer" length="1"/>
<binary name="<Binary Fill Bytes>" id="9" unused="yes" length="3"/>
<number name="Length" id="10" fillcolor="AAAAFF" type="integer" length="2"/>
<binary name="<Binary Fill Bytes-1>" id="11" unused="yes" length="2"/>
<binary name="Used" id="12" fillcolor="A2FF27" length="4"/>
<binary name="Salt" id="13" fillcolor="FF973C" length="8"/>
</structure>
</structure>
</grammar>
</ufwb>
If you have installed hexinator then open the grammer file and then open "Visited Links" file, it will prompt you to apply the grammer to the newly opened file. Apply it and you see something like below:
So, that's all about the basics. Now we move to the point where we understand the url digest mechanism.
URL digest mechanismAs mentioned before the url is hashed with md5 along with the salt. But only the first 8 bytes of hash is stored in this file. The general syntax for calculating the URL fingerprint is:
MD5( salt + url ) then take the first 8 bytes. If we look at Chromium's source then we get this info
from hereOne last thing I should mention before proceeding to the next topic is that this file does not store any url data when a user browses the web in
In-Porn Mode oops I made a mistake, its
Incognito mode ( ).
See hereSo now that we have understood about the internals of Visited Links file. It's finally time to write a little parser for it. People who are interested can dig into the chromium project and find some cool stuff and if you get it share it with us (Specially if you find something interesting with sessions). I wrote a little parser in java and the code is pretty much readable (at least I hope so).
Important variable declarations:
private final HashSet<String> fingerprints;
private final byte[] VLNK_MAGIC_HEADER = "VLnk".getBytes();
private byte[] salt;
private final int HEADER_SALT_OFFSET = 0x10;
private final int HEADER_SALT_LENGTH = 8;
private final int URL_FINGERPRINT_LENGTH = 8;
Method that parses the file, verifies the file signature, retrieves the salt used to hash the urls and stores all the fingerprints in a Set.
Note: Refer this to get the code for verifying file signatures /**
* Parses the
* <pre>Visited Links</pre> file and stores the fingerprints in a HashSet
*
* @return Returns 1 if parsing was successful else returns -1
*/
public int parse() {
if (Utils.verifyFileHeader(vLnkFile, VLNK_MAGIC_HEADER)) {
salt = new byte[HEADER_SALT_LENGTH];
byte[] bytes = new byte[URL_FINGERPRINT_LENGTH];
try (RandomAccessFile raf = new RandomAccessFile(vLnkFile, "r")) {
int val;
raf.seek(HEADER_SALT_OFFSET);
raf.read(salt);
while ((val = raf.read()) != -1) {
if (val != 0) {
raf.seek(raf.getFilePointer() - 1);
raf.read(bytes, 0, URL_FINGERPRINT_LENGTH);
fingerprints.add(Utils.byteArrayToHex(bytes));
}
}
} catch (FileNotFoundException ex) {
System.err.println(ex.getMessage());
} catch (IOException ex) {
System.err.println(ex.getMessage());
}
} else {
return -1;
}
return 1;
}
Now to get a custom URL fingerprint refer the following:
/**
* Calculates the URL fingerprint for any custom url.
*
* @param salt Salt used during hashing.
* @param data array of bytes for the URL whose fingerprint will be
* calculated.
* @return Returns the first 8 bytes of the digest after hex encoding them.
*/
public String getUrlFingerprint(byte[] salt, byte[] data) {
byte[] mdBytes = null;
try {
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(salt);
md.update(data);
mdBytes = md.digest();
} catch (NoSuchAlgorithmException ex) {
System.err.println("Couldn't determine the hashing algorithm." + ex.
getMessage());
}
return Utils.byteArrayToHex(Arrays.copyOf(mdBytes, URL_FINGERPRINT_LENGTH));
}
Finally to see is a URL was visited and it is present in this file or not, we can directly call the following method:
public boolean isVisited(String url) {
return fingerprints.contains(getUrlFingerprint(salt, url.getBytes()));
}
You can have a look at the complete class here:
https://github.com/AnimeshShaw/ChromeForensics/blob/master/src/main/java/net/letshackit/chromeforensics/core/artefacts/VisitedLinkParser.javaI put up everything above in a single file with a main method and it has a practical example as well. You will need sqlite-jdbc library to execute the code. You can get it from
here
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Arrays;
import java.util.HashSet;
/**
* Class that parses the
* <pre>Visited Links</pre> file and gathers the information contained
*
* @author Psycho_Coder
*/
public class VlinkParser {
private final File vLnkFile;
private final HashSet<String> fingerprints;
private final byte[] VLNK_MAGIC_HEADER = "VLnk".getBytes();
private byte[] salt;
private final int HEADER_SALT_OFFSET = 0x10;
private final int HEADER_SALT_LENGTH = 8;
private final int URL_FINGERPRINT_LENGTH = 8;
public VlinkParser(File vLnkFile) {
this.vLnkFile = vLnkFile;
fingerprints = new HashSet<>();
}
public boolean verifyFileHeader(File file, byte[] magicNumber) {
try (FileInputStream fis = new FileInputStream(file)) {
byte[] buffer = new byte[magicNumber.length];
if (fis.read(buffer) != -1) {
return Arrays.equals(magicNumber, buffer);
}
} catch (IOException ex) {
System.err.println(ex.getMessage());
}
return false;
}
public void parse() {
if (verifyFileHeader(vLnkFile, VLNK_MAGIC_HEADER)) {
salt = new byte[HEADER_SALT_LENGTH];
byte[] bytes = new byte[URL_FINGERPRINT_LENGTH];
try (RandomAccessFile raf = new RandomAccessFile(vLnkFile, "r")) {
int val;
raf.seek(HEADER_SALT_OFFSET);
raf.read(salt);
//int currOff = HEADER_FILE_SIZE + 1;
while ((val = raf.read()) != -1) {
if (val != 0) {
raf.seek(raf.getFilePointer() - 1);
raf.read(bytes, 0, URL_FINGERPRINT_LENGTH);
fingerprints.add(byteArrayToHex(bytes));
}
}
} catch (IOException ex) {
System.err.println(ex.getMessage());
}
}
}
/**
* Converts array of bytes to hex string.
*
* @param bytes Byte Array to be converted to Hex String.
* @return Returns the hex string for {@code bytes} array.
*/
public static String byteArrayToHex(byte[] bytes) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < bytes.length; i++) {
sb.append(Integer.toString((bytes[i] & 0xff) + 0x100, 16).
substring(1));
}
return sb.toString();
}
public static String getUrlFingerprint(byte[] salt, byte[] data) {
byte[] mdBytes = null;
try {
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(salt);
md.update(data);
mdBytes = md.digest();
} catch (NoSuchAlgorithmException ex) {
System.err.println("Couldn't determine the hashing algorithm." + ex.
getMessage());
}
return byteArrayToHex(Arrays.copyOf(mdBytes, 8));
}
public boolean isVisited(String url) {
return fingerprints.contains(getUrlFingerprint(salt, url.getBytes()));
}
private byte[] getSalt() {
return salt;
}
public static void main(String[] args) throws SQLException {
Connection c = null;
Statement stmt = null;
ResultSet rs = null;
try {
VlinkParser parser = new VlinkParser(new File("C:\\Users\\Psycho\\AppData\\Local\\Google\\Chrome\\User Data\\"
+ "Default\\Visited Links"));
parser.parse();
String hisPath = "C:\\Users\\Psycho\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\History";
Class.forName("org.sqlite.JDBC");
c = DriverManager.getConnection("jdbc:sqlite:" + hisPath);
stmt = c.createStatement();
rs = stmt.executeQuery("SELECT urls.url, datetime(((visits.visit_time/1000000)-11644473600), \"unixepoch\")"
+ "FROM urls, visits WHERE urls.id = visits.url;");
Path savePath = Paths.get(System.getProperty("user.dir"), "Url-Visits.tsv");
try (BufferedWriter writer = Files.newBufferedWriter(savePath, StandardCharsets.UTF_8)) {
StringBuilder sb;
writer.append("URL\tVISITED TIME\tHAS VISITED?\n");
while (rs.next()) {
sb = new StringBuilder();
sb.append(rs.getString(1));
sb.append("\t");
sb.append(rs.getString(2));
sb.append("\t");
sb.append(parser.isVisited(rs.getString(1)));
writer.append(sb.toString());
writer.newLine();
}
} catch (IOException ex) {
System.err.println(ex.getMessage());
}
} catch (ClassNotFoundException | SQLException ex) {
System.err.println(ex.getMessage());
} finally {
if (rs != null) {
rs.close();
}
if (stmt != null) {
stmt.close();
}
if (c != null) {
c.close();
}
}
}
}
So what does the above code do?
It opens the SQlite database file
History (I have explained about it earlier) and reads two tables namely,
urls and
visits and fetches the visit time of the url. There after it checks whether there are any records of it in the Visited Links file. Finally it writes the details in a TSV file.
There are two important things that I haven't mentioned clearly but I have given hints for it and left it as an exercise for the reader. Investigate and let me know. So with this I finally conclude my paper. If you were able to reach upto this point while reading sincerely then I really appreciate it and I hope you like it. For any clarifications please reply below.
Thanking you,
Sincerely,
Psycho_Coder