...

/

Introduction to Metadata

Introduction to Metadata

Get acquainted with PDF metadata and its subtleties.

Introduction

A PDF document is intrinsically rich in metadata artifacts, which can be valuable information during a digital forensic investigation. While there are multiple ways to extract this metadata from a PDF file, such techniques are either manual processes or do not encompass all the metadata artifacts.

What is metadata?

Simply put, metadata is defined as data, about data. Generally, the metadata of digital objects is divided into two categories:

  • The file system metadata

The file system metadata refers to data elements that are related to the hosting file itself and do not participate in the byte-sequence that constitutes the file’s binary structure.

  • The application metadata

The application metadata deals with the elements that are intrinsic to the file and participate in the binary’s byte-sequence.

For the sake of this course, we will emphasize the manipulation of the application metadata.

Application metadata types

The application metadata is stored within a PDF file as either a document information dictionary object or a metadata stream object.

The document information dictionary (DID) metadata has been part of the PDF since version 1.0. They cover general information about a PDF file by combining pairs of data objects consisting of a key and a matching value. The metadata streams available since PDF 1.4 (2001) are viewed as an elaborated mechanism for embedding more comprehensive metadata attributes in a PDF document. The contents of the metadata stream are represented in Extensible Markup Language (XML), and may include metadata for the entire PDF, and specific components within it.

  • The document information dictionary

There are nine attributes associated with the DID objects, listed below:

Key Name Data Type Value Description
/Title Text
...