Search⌘ K
AI Features

Introduction to YAML

Understand what YAML is, how it works, and why it has become essential in modern software development.

Welcome to the YAML learning journey

Welcome! This course is an interactive version of my best-selling book — Introduction to YAML: Demystifying YAML Data Serialization. Whether you’re configuring Docker containers, setting up CI/CD pipelines, or managing Kubernetes clusters, YAML skills will be very crucial.

Let’s begin by answering the most fundamental question: What is YAML?

What is YAML?

YAML stands for YAML Ain’t Markup Language (pronounced like “camel”).

Originally, it stood for Yet Another Markup Language. It was later changed to emphasize that the language is intended for data and not documents.

Key characteristics of YAML

  • A straightforward, text-based format for exchanging data between people and computers.

  • Not a programming language. It is mainly used for storing configuration information. It stores data, not executable commands or instructions.

  • Human-readable. It is designed to be easy for humans to read and write.

  • Data serialization language similar to XML or JSON. However, it is much more concise.

What is data serialization?

In case you are wondering what data serialization is, here is the definition.

Data serialization is the process of converting data objects, or object states present in complex data structures, into a stream of bytes for storage, transfer, and distribution in a form that can allow recovery of their original structure.

Where is YAML used?

  • YAML can represent any data structure that can be represented using either XML or JSON format.

  • YAML can store and manage any textual data in Unicode format, like configuration or log files. This makes it software and hardware platform independent for storing, transporting, and sharing data. It is essential to understand this format because any changes to configuration files can cause your application to work incorrectly.

YAML is used in configuration files, log files, object persistence, caching, and messaging (sending information across various software components or applications).

Before we go further, let’s clarify a few important terms:

Messaging is sending information across various software components or applications.

Persistence is the process of preserving the state of an object longer than the lifespan or duration of the process of creating the object. This is achieved by storing the object’s state in non-volatile memory, such as a hard disk, rather than volatile memory, such as RAM.

Caching is a technique of storing a copy of a repeatedly requested resource and serving it back from the copy itself instead of from the remote server. This makes the application more responsive.

Benefits of YAML

YAML gained widespread adoption because of the following benefits:

  • Easy to implement and use - It is easy to implement and use.
  • Unambiguous - YAML unambiguously specifies the data structures of the serialized data, so there’s no need to rely on comments or documentation. Because the data structures are unambiguous, it makes it easier to use automated tools (scripts) for reading and writing YAML. For example, you could write a script that reads a YAML-formatted data structure from a file and converts it to another format (such as JSON or XML).
  • Version control friendly - YAML stores plain text so that it can be added to a version control system, such as a Git repository, without any issues.
  • Portable across programming languages - YAML supports representing sequences as lists and mappings as dictionaries (hashes in some languages) in a language-independent manner.
  • Powerful - It is more powerful than JSON when it comes to specifying complex data structures. It’s a superset of JSON, meaning all valid JSON documents are also valid YAML.
  • Matches native data structures of modern programming languages such as Python, Ruby, and JavaScript. There are multiple YAML parsers in different languages, so you can use the same language for both generating and parsing YAML.
  • Consistent data model to support generic tools. It has powerful tools, such as PyYAML.
  • Supports one-pass or one-direction processing - Parsing YAML is linear. There are no forward or backward pointers to deal with, so there are fewer parsing ambiguities.
  • Expressive and extensible - By extensibility, we mean existing applications continue to work even when new data is added (or removed).
  • Fast - YAML is fast to load and easy to process in memory.
  • Secure - Many security issues in programming languages are related to parsing untrusted input (such as JSON). Python, Ruby, Java, JavaScript, PHP, and more make it possible for attackers to exploit these vulnerabilities by bypassing the parser’s unexpected input handling. YAML is designed to prevent these types of exploits by specifying the data types for each part of the YAML stream.

Due to the above benefits, numerous modern tools and applications rely on YAML.

What makes YAML different?

Unlike markup languages (HTML, XML) that focus on document structure, YAML focuses on representing data structures in the most human-readable way possible.

YAML’s philosophy

  • Simple and clean syntax: YAML is easy to learn and simple to read. It can easily express a wide variety of different data structures.
  • Strict: The YAML specification has minimal leeway for flexibility, which increases its robustness.
  • Human-readable: YAML is very much human-readable. It allows you to represent complex data structures in a human-readable manner. To prove this point, even the homepage of YAML’s official site (https://yaml.org) is displayed as a YAML document.
  • Indentation conveys structure: Instead of using curly braces or angle brackets, YAML shows parent-child relationships through indentation, just like an outline or a table of contents.

Understanding YAML is essential because any changes to the files that store configuration data in YAML format can cause an application that uses it to work incorrectly.

History of YAML

The following are different versions of the YAML language

  • 1.0,
  • 1.1,
  • 1.2,
  • 1.2.2

Timeline of YAML’s evolution

  • In early 2004, Clark Evans, Oren Ben-Kiki, and Ingy döt Net released the YAML 1.0 specification, which was its first release.

  • In 2005, the YAML 1.1 standard was released. By this time JSON became popular. Accidentally, JSON was a subset of YAML (both syntactically and semantically).

  • YAML 1.2 was released in 2009 to make sure it is a proper superset of JSON. YAML 1.2 has been updated to include more security features, making it more resistant to attacks.

  • YAML 1.2.2 is the latest revision of the YAML specification. It was released on 1st October 2021. It mainly focuses on improving clarity, readability, and removal of ambiguity in the specification.

Today, YAML is used by millions of developers and has become the de facto standard for cloud-native configuration.

Summary

  • YAML stands for YAML Ain’t Markup Language.
  • YAML is a very simple data format for exchanging data.
  • It is similar to XML or JSON.
  • YAML files usually store configuration data.
  • YAML is a human-readable, clean, simple, easy to implement, consistent, portable, expressive, and extensible data serialization format.

Quiz about YAML fundamentals

1.

YAML stands for

A.

Yet Another Markup Language

B.

YAML Ain’t Markup Language

C.

Your Another Markup Language

D.

Your Awesome Markup Language


1 / 3