Search⌘ K
AI Features

How to Read and Add Comments to a PDF Document

Explore how to read and add comments to specific keywords within PDF documents using Python and the PyMuPDF library. This lesson helps you efficiently insert, view, and manage comments directly in PDFs, which supports better collaboration and review processes in document handling.

Introduction

Comments act like revision tools for people reviewing and exchanging PDF documents.

Conventionally, comments added to a document represent inquiries, ideas, or concerns about a particular section or keyword within this document.

In general, you can add comments to any PDF document, unless security constraints have been applied to the document that prohibit commenting.

Objective

When we need to communicate with our colleagues about the content in a PDF document, it’s more straightforward to insert your comments in the PDF itself, rather than formalizing and dispatching these comments in an email or through any other communication channels.

This lesson will lay out the steps required to read and put comments on a specific keyword in a PDF document, while using a command-line utility developed in the Python programming language.

Requirements

We need the following libraries to add comments to a PDF document:

PyMuPDF

This third-party Python library has been developed by Artifex Software Inc., and provides support for MuPDF. It can run on multiple platforms like Windows, Linux, and Mac.

Filetype

A dependency-free Python library allows concluding the type, as well as the MIME type, of a file or a buffer by checking their signatures.

Library Version
PyMuPDF 1.18.9
Filetype 1.0.7

Code explanation

The comment_pdf is the main function of our utility. It can perform the following:

  1. Open the selected PDF document (Line 11).
  2. Iterate through its pages and ignore the pages unselected (Lines 14-19).
  3. Using the function searchFor, search for the keyword (parameter search_text) to put a comment on. This function will return a list representing the positions of the found instances of this keyword (Line 22).
  4. Loop through the found instances of the keyword (Line 27).
  5. Enclose each found instance with a bounding box, dashed and colored in blue (Lines 29-31).
  6. Add the supplied comment to the found instance (Lines 34-38).
  7. Save the comment set (Line 40).
  8. Save the processed document to the output file (Line 43).
  9. Close the document.
  10. Display a summary of the executed process.
Python 3.5
def comment_pdf(input_file:str
, search_text:str
, comment_title:str
, comment_info:str
, output_file:str
, pages:list=None
):
"""
Search for a particular string value in a PDF file and add comments to it.
"""
pdfIn = fitz.open(input_file)
found_matches = 0
# Iterate throughout the document pages
for pg,page in enumerate(pdfIn):
pageID = pg+1
# If required for specific pages
if pages:
if pageID not in pages:
continue
# Use the search for function to find the text
matched_values = page.searchFor(search_text,hit_max=20)
found_matches += len(matched_values) if matched_values else 0
#Loop through the matches values
#item will contain the coordinates of the found text
for item in matched_values:
# Enclose the found text with a bounding box
annot = page.addRectAnnot(item)
annot.setBorder({"dashes":[2],"width":0.2})
annot.setColors({"stroke":BLUE_COLOR})
# Add comment to the found match
info = annot.info
info["title"] = comment_title
info["content"] = comment_info
#info["subject"] = "Educative subject"
annot.setInfo(info)
annot.update()
#Save to output file
pdfIn.save(output_file,garbage=3,deflate=True)
pdfIn.close()
#Process Summary
summary = {
"Input File": input_file
, "Matching Instances": found_matches
, "Output File": output_file
, "Comment Title": comment_title
, "Comment Info": comment_info
}
# Print process Summary
print("## Summary ########################################################")
print("\n".join("{}:{}".format(i, j) for i, j in summary.items()))
print("###################################################################")
Attached the complete code of our utility

Test scenario

Let’s put a comment on a specified keyword in a sample PDF document.

Execute the code snippet below and look into the output generated:

Python 3.5
import fitz,os,filetype,argparse,subprocess
BLUE_COLOR = (0,0,1)
def build_range(rangeval:str):
"""
Build the range of pages based on the parameter inputted rangeval
"""
result=set()
for part in rangeval.split(','):
x=part.split('-')
result.update(range(int(x[0]),int(x[-1])+1))
return list(sorted(result))
def comment_pdf(input_file:str
, search_text:str
, comment_title:str
, comment_info:str
, output_file:str
, pages:list=None
):
"""
Search for a particular string value in a PDF file and add comments to it.
"""
pdfIn = fitz.open(input_file)
found_matches = 0
# Iterate throughout the document pages
for pg,page in enumerate(pdfIn):
pageID = pg+1
# If required for specific pages
if pages:
if pageID not in pages:
continue
# Use the search for function to find the text
matched_values = page.searchFor(search_text,hit_max=20)
found_matches += len(matched_values) if matched_values else 0
#Loop through the matches values
#item will contain the coordinates of the found text
for item in matched_values:
# Enclose the found text with a bounding box
annot = page.addRectAnnot(item)
annot.setBorder({"dashes":[2],"width":0.2})
annot.setColors({"stroke":BLUE_COLOR})
# Add comment to the found match
info = annot.info
info["title"] = comment_title
info["content"] = comment_info
#info["subject"] = "Educative subject"
annot.setInfo(info)
annot.update()
#Save to output file
pdfIn.save(output_file,garbage=3,deflate=True)
pdfIn.close()
#Process Summary
summary = {
"Input File": input_file
, "Matching Instances": found_matches
, "Output File": output_file
, "Comment Title": comment_title
, "Comment Info": comment_info
}
# Print process Summary
print("## Summary ########################################################")
print("\n".join("{}:{}".format(i, j) for i, j in summary.items()))
print("###################################################################")
if __name__ == "__main__":
#Move to the project directory
os.chdir('/usr/src/mypdftoolbox')
#Input PDF file
input_file_name = 'Sample1.pdf'
#Output PDF file
output_file_name = 'Sample1_commented.pdf'
#Selected pages
pages = '1,2'
pages_list = build_range(pages) if pages else None
#Commenting the document
comment_pdf(input_file = os.path.join('./static',input_file_name)
, search_text = 'COVID'
, comment_title = 'Pay Attention'
, comment_info = 'COVID is dangerous'
, output_file = os.path.join('/usercode/output',output_file_name)
, pages = pages_list
)
#Downloading input file
subprocess.call(['cp'
, os.path.join('./static',input_file_name)
, os.path.join('/usercode/output',input_file_name)
])

We will notice that the value we searched for is enclosed by a dashed blue box, and that once we hover the mouse over it, the comment we specified is displayed. See below:

We highly encourage you to change the code snippet and develop your own test cases as well.

Conclusion

Comments become a necessity when developing documents conjointly with others. In short, they are used to emphasize areas that need attention.