Author Topic: Metadata: What you might miss. (Read 1869 times)

kenjoe41 · « **on:** September 06, 2014, 07:51:57 pm »

For every single file that touches the filesystem, it travels with it a tremendous amout of artifacts.
One of these is the metadatawhich can be both for a specific file or from the filesystem.
Analysis of the file can be through content identification and metadata extraction.
Content identification is the process of determining or verifying what a specific file is; contents and stuff.
Metadata extraction is the retrieval of any embedded metadata that may be present in a given file; and this is going to be our focus in this tutorial.

METADATA EXTRACTION:
metadata are information stored within the file itself that provide some possibly interesting but otherwise nonessential information about the file. Many tools;exiftool, exiv2, hachoir-metadata, et-la,are used to view it.
All files can include filesystem MAC [modified, access,creation] times.

IMAGES:
These are file that contain data to be rendered as graphics.
There are 3 types of metadata in image file;
-EXIF (Exchangeable Image File Format) was developed to embed information about the device capturing the image (typically a camera) into the image itself.
-IPTC was designed originally to embed information about images used by newspapers and news agencies.
-XMP is the XML-based "eXensible Metadata Platform" developed by Adobe in 2001.
JPEG: Is rich with all forms of metadata as above plus the JFIF metadata. Other image formats are scarce with metadata since they are mostly computer generated like GIF, PNG and TIFF.

AUDIO:
Data that imparts sound when decoded properly.
1.WAV(Waveform Audio File Format): WAV audio is stored inside of a RIFF container which supports INFO chunks with various metadata. And can also contain XMP tags. Take note of app used to create, date of creation and Author.
2.MPEG-3/MP3(Moving Picture Experts Group): MP3 contains two metadata types; ID3v1 and ID3v2. ID3v1 has 128bytes of appended metadata and 227bytes when extended. Size in ID3v2 is unrestricted. For both ID3 versions, ID3v2 tool, exiftool and hachoir-metadata can extract them.
3.MPEG-4 Audio(AAC/M4A) and ASF/WMA: Also contain ID3 tags and mild metadata respectively.

VIDEO:
Data that decode into a sequence of moving images.
Video formats can contain XMP tags, INFO chunks and other video specific metadata plus ofcourse the MAC times.

ARCHIVES:
These are container files designed to hold other files, apply compression and sometime encryption to contained files. Some archive types may retain information from their system of origin, including UID and GID information from Unix-like systems.
1.ZIP: We can use the hachoir-urwid, unzip command to retrieve information about the content of a ZIP archive without actually extracting.

Code: (sh) [Select]

$ unzip -v file.zipThis can be good to examine file modification dates embedded in the archive.
2.RAR: We can examine the contents of RAR archives using the RAR plugin to 7zip.

Code: (sh) [Select]

$ 7z l file.rarSome RAR archives may contain comments, "NFO" files.
3.TAR,GZIP,BZIP2: Tarballs have a property of retaining the owner and group information of system they were created on plus the time stamps like other formats. Not to lose fidelity, they are better analysed in layers: Compression layer with

Code: (sh) [Select]

$ gunzip --list --verbose file.tar.gzThen the archive itself with

Code: (sh) [Select]

$ tar --list --verbose --gunzip --file file.tar.gzAnd you will be surprised.

DOCUMENTS:
A filetype containing text, images and rendering information. These come with alot of metadata, ranging from authorship information and document revision histories to internal timestamps and information about the system(s) used to edit the file.
1.OLE Compound Files: Documents created using the Microsoft Office 1997–2003 binary formats e.g. PowerPoint presentations, Word Documents (DOC), and Excel Spreadsheets (XLS). These have a portable filesystem format with storage objects as (sub-)directories and stream objects which are sequences of sectors allocated for a discrete piece of data. So imagine how much metadata you can live in this.
2.OpenDocument Format and Office Open XML like Docx: Goodnews is that these have a Zip format so all the techniques for zip files work here. Care should be taken to analyse every other individual object in them so that none is advertising you to the world.

Code: (sh) [Select]

$unzip -l kendocs.docx/.odt3.Rich Text Format: metadata are contained in the body of the document itself and can be viewed directly with no additional tools. So good luck analysing a really big RTF doc.
4.PDF: Basically, it is container file that holds a sequence of PostScript layout instructions and embedded fonts and graphics. These contain two different types of metadata; Document Information
Directory contains key/value pairs with authorship information, document title, and creation/modification time stamps, and XMP.

I wish i could have made this shorter but be careful when uploading anything to the interwebs, a small thing forgotten might be your undoing.
Best of luck!

proxx · « **Reply #1 on:** September 07, 2014, 04:54:40 am »

I had a good read, thanks kenjoe41

M1lak0 · « **Reply #2 on:** September 07, 2014, 06:11:43 pm »

Is there any way to remove metadata of a file completely?
What I did to remove a metadata of an Image is:
I uploaded it on Facebook and again downloaded it!

Ok what can be removed is:
Where the pic was taken(Geo Location),
Which cam was used, etc..
but this don't work for other files like ppt or doc

I want to know is there any other way to remove metadata using any kind of tool or manually?
I know once you remove metadata it generates again because once you open it on your own system again it'll be modified and dates will change but that's not the issue.

So any way?

Architect · « **Reply #3 on:** September 07, 2014, 11:29:02 pm »

There are several ways to go about removing complete sets of metadata from files such as a PDF.

These can be tricky sometimes if you're manually doing it. Otherwise you could use Adobe Acrobat X Pro to accomplish the task. Besides that, you can try sed, awk, pdftk, etc. The method I use usually for PDFs is:

Code: [Select]

pdftk x.pdf dump_data output <outputfile>; vim <outputfile>; pdftk x.pdf update_info <outputfile> output y.pdf

Code: [Select]

sed -i 's/iText\ 2\.1\.7\ by\ 1T3XT//' <y.pdf>

Code: [Select]

sed -i 's/PdfID1:\ ^[0-9a-f]{32}[ ]{2}.*$]//'<y.pdf>

M1lak0 · « **Reply #4 on:** September 08, 2014, 09:54:57 am »

ummm only pdf??

Why not ppt or exe or something else?

Kulverstukas · « **Reply #5 on:** September 08, 2014, 02:25:24 pm »

Welp, Deque created a really nice tool to strip EXIF data from images: https://evilzone.org/evilzone-releases/%28java%29-exifremover-0-1/
Works great.

EvilZone

News:

Author Topic: Metadata: What you might miss. (Read 1869 times)

kenjoe41

Metadata: What you might miss.

proxx

Re: Metadata: What you might miss.

M1lak0

Re: Metadata: What you might miss.

Architect

Re: Metadata: What you might miss.

M1lak0

Re: Metadata: What you might miss.

Kulverstukas

Re: Metadata: What you might miss.