...

Metadata Treatment

Learn how to gather, modify, and delete the various types of Metadata embedded within a PDF File.

We'll cover the following...

Introduction
Scope
Prerequisites

PyPDF4
Pikepdf

Let’s start coding
Let’s try our utility

Scenario 1: Collecting the DID metadata attributes
Scenario 2: Updating the DID metadata attributes
Scenario 3: Managing the XMP metadata attributes

Conclusion

Introduction

Metadata is typically populated by PDF conversion applications. It encloses relatively common fields showing the document version, creation date, and creation program, among others. Some overlooked attributes merit a closer look in case you want to dive into PDF analysis.

Scope

The objective of this lesson is to show how to extract, update, and delete the metadata of a PDF file using the Python programming language.

Prerequisites

We need two libraries for metadata manipulation:

PyPDF4

It is a pure-python PDF library best suited to split, merge, crop, and transform the pages of a PDF file. Additionally, it can retrieve text and metadata from PDFs.

Pikepdf

It is a library intended for developers to create, manipulate, and parse the PDF format. It supports reading and writing PDFs, including creating from scratch.

Library	Version
PyPDF4	1.27.0
Pikepdf	3.0.0

The Pikepdf library allows PDF XMP metadata editing in contrast to ...

Introduction

PDF Management Core Functions

Pages Processing

Content Processing

Document Processing

Conclusion

Appendices

Metadata Treatment

Introduction

Scope

Prerequisites

PyPDF4

Pikepdf