Search⌘ K
AI Features

JSON and Serialization

Explore how to serialize Python objects using JSON for readable, cross-language data storage and pickle for Python-specific binary serialization. Understand when to use each method, how to handle nested data, and the security considerations of unpickling data from untrusted sources.

So far, we have focused on reading and writing raw text, but real-world applications rarely work with simple, flat strings. In practice, we often need to persist complex data structures, such as dictionaries of user settings, lists of high scores, or the state of an ongoing computation. If we were to convert these objects to text using str(), restoring them later would require writing custom parsing logic to rebuild the original structures. This approach is fragile and error-prone.

To solve this, we should convert an in-memory object into a standardized format that can be stored on disk or transmitted over a network, and later convert it back into a live Python object. This process is called serialization.

What is serialization?

Serialization (also known as marshaling or encoding) is the process of converting a data structure or object state into a format that can be stored in a file or transmitted across a network. Deserialization is the reverse process: reconstructing the object from the stored format.

Python provides two primary tools for serialization, each designed for different use cases:

  • JSON (json): A text-based, human-readable format. It is ideal for configuration files and for exchanging data with other programming languages such as JavaScript, Java, ...