How to compress data using Python

A ZIP file is a compressed archive type that lets us bundle and compress several files or directories into one file with a .zip extension. To simplify transferring or storing various files, the main goal of producing a ZIP file is to lower the overall file size. Here are a few benefits of making ZIP files:

  • File compression: ZIP file uses compression algorithms to reduce the size of files included in that bundle. This compression makes transferring the files over the internet more convenient, as smaller files will take less bandwidth.

  • File organization: A ZIP file will create a single bundle of all the files, making it easy to maintain a tidy but organized file structure.

  • Backup: Creating ZIP files is a convenient way to create backups of multiple files. It simplifies the backup process by consolidating multiple files into a single archive, making it easier to manage and store.

  • Reducing storage space: Compressing files into ZIP archives can save disk space, especially when dealing with many files. This is particularly relevant for archival purposes.

It would be best if we first built a ZipFile object (notice the capital letters Z and F) to read the contents of a ZIP file. Similar in concept to the File objects returned by the open() function, ZipFile objects are values that allow the program to communicate with the file. The zipfile module in Python is a standard library designed for working with ZIP files. This file format is commonly used in industry regarding digital data compression and archiving. It can be used to group multiple relevant files. To compress the file, use the following process:

  1. Call the ZipFile() function from the zipfile module to create an empty ZipFile Object. This object will accept the following parameters:

    1. file: This parameter accepts the file name as a string.

    2. mode: This parameter accepts the following character tags:

      1. r: This tag is used to read the existing file.

      2. w: This tag is used to truncate and write a new file.

      3. a: This tag is used to append an existing file.

      4. x: This tag is used to exclusively create and write a new file.

  2. Use the write() method from ZipFile the object to add and compress the files.

  3. Use the getinfo() method to get the information of a compressed file. This method will accept the file name as a parameter.

Code example

So, let’s start by creating a .zip file using the Python programming language. For this code, we will create a .zip file containing a text file and then check the file size before and after compression.

main.py
file.txt
import zipfile
# Creating a new zip file
newZip = zipfile.ZipFile('new.zip', 'w')
# Writing a file to the zip file
newZip.write('file.txt', compress_type=zipfile.ZIP_DEFLATED)
# Getting the information of file from the zip file
fileInfo = newZip.getinfo('file.txt')
# Printing the stats
print("File actual size: ", str(fileInfo.file_size / 100) + "kB")
print("Compressed size of file: ", str(fileInfo.compress_size/1000) + "kB")
print('Compressed file is '+ str(round(fileInfo.file_size/fileInfo.compress_size,2)) +"x smaller!")
# Finally closing the zip file
newZip.close()

Code explanation

Let’s see the code in detail:

  • Line 1: This line is used to import the modules.

  • Line 3: This line uses the ZipFile() method to create an empty ZIP file in our current directory.

  • Line 5: This line uses the write() method to compress the file.txt file.

  • Line 7: This line uses the getinfo() method to get the information of the file.

  • Line 13: This line uses the close() method to finally close the opened ZIP file.

Note: To extract the data from the ZIP file using python, you might need to create the ZipFile object using the existing file and call the extractall() method to get all the files available in the ZIP file.

Copyright ©2024 Educative, Inc. All rights reserved