Search⌘ K
AI Features

Working with Binary Files

Explore how to handle binary files in Python by reading and writing raw bytes without decoding errors. Understand using binary modes, file cursor methods like seek and tell, and chunking techniques to process large files efficiently without exhausting memory.

So far, we have treated files as containers for text, i.e., streams of characters meant to be read by humans. However, most data in the digital world is not textual. Images, audio files, compiled programs, and compressed archives are all stored as precise sequences of bytes.

Binary formats represent data such as pixel values, audio samples, or machine instructions. If a binary file (such as a JPEG image) is opened in text mode, Python may produce unreadable characters or raise decoding errors because the data does not correspond to a valid text encoding. To process these files correctly, we must read and write the data as raw bytes instead of text.

Text vs. binary mode

When we open a file in text mode (the default 'r' or 'w'), Python automatically performs a translation step. It reads raw bytes from disk and decodes them into a str object using a character encoding such as UTF-8. This behavior is ideal for human-readable text like essays, configuration files, or logs.

For binary data, however, this translation is destructive. In formats such as images or executables, each byte has a precise meaning. A value like 0xFF represents data, not a character, and attempting to decode it can corrupt the file or raise errors. ...