How to convert a pdf to an image using Python
Python provides us with multiple libraries that provide the functionality to convert each page of a PDF into an image of different formats like PNG, JPG, and JPEG. In this Answer, we will explore the pdf2image library to convert a PDF into an image.
Install pdf2image
pdf2image converts a PDF into a
pip install pdf2image
Convert a PDF into an image
Once we are done installing the pdf2image library, we can write the Python code to convert our PDF into images. The code is given below:
from pdf2image import convert_from_path
pdf_images = convert_from_path('my_pdf.pdf')
for idx in range(len(pdf_images)):
pdf_images[idx].save('pdf_page_'+ str(idx+1) +'.png', 'PNG')
print("Successfully converted PDF to images")As we run the code, the PDF is successfully converted to images, which are stored in the project folder. The explanation of the code is given below.
Line 1: We import
convert_from_pathfrom thepdf2imagelibrary.Line 3: We use the
convert_from_pathfunction and pass the path to the PDF as a string. The function converts the PDF images into a PIL image list, which we store in thepdf_imagesvariable.Lines 5 and 6: We define a for loop that iterates through the number of PIL image objects stored in the
pdf_imagesvariable. On each iteration, we use thesave()function of the PIL image objects and provide the name (first parameter) and image format (second parameter). In our case, we are saving each image with the namepdf_page_appended with a number starting from 1 – to keep the names unique).
Conclusion
pdf2image library makes it a seamless process to convert a PDF into images which may be used in OCR (optical character recognition) tasks and manipulating images.
Free Resources