Search⌘ K
AI Features

The Abstract Syntax Tree

Explore the structure and role of the Abstract Syntax Tree in Elixir metaprogramming. Understand how Elixir code is represented as nested tuples, enabling efficient macro writing and problem-solving. This lesson prepares you to manipulate and generate code by mastering AST details.

Understanding the AST is essential as you get into advanced metaprogramming. Once we uncover the nuances, we’ll find that Elixir code is much closer to the AST than we might have imagined.

Understanding ASTs will change the way we think about solving problems and drive our macro design decisions in the future. After reviewing the finer details of the AST, we’ll be ready to begin the metaprogramming exercises.

%0 node_1 + node_2 * node_1->node_2 node_3 * node_1->node_3 node_1626433959104 a node_2->node_1626433959104 node_1626433905717 b node_2->node_1626433905717 node_1626433894386 * node_3->node_1626433894386 node_1626433906280 c node_3->node_1626433906280 node_1626433923111 d node_1626433894386->node_1626433923111 node_1626433901090 a node_1626433894386->node_1626433901090
The Structure of the Abstract Syntax Tree

The structure of the AST

Every expression we write in Elixir breaks down to a three-element tuple in the AST. We often rely on this uniform breakdown when pattern matching arguments in macros.

Since we know that an expression like 5 + 2 turns into the tuple {:+, [...], [5, 2]}, we pattern matched directly against the AST to determine the meaning of each calculation.

Let’s quote a couple more complex expressions to see how entire Elixir programs are structured in the AST.

Try the command quote do: (5 * 2) - 1 + 7 in the terminal given below:

Terminal 1
Terminal
Loading...

Let’s try another example. Execute the following lines of code in the iex terminal provided above.

Erlang
quote do
defmodule MyModule do
def hello, do: "World"
end
end

From the output, we can see that a stacking tuple is produced from each quoted expression. The result of this example shows how a simple AST represents an entire Elixir module.

All along, the code we’ve written in Elixir is represented by this simple uniform structure. Understanding this structure just requires a few simple rules.

All Elixir code is represented as a series of three-element tuples with the following format:

  • The first element is an atom denoting the function call, or another tuple, representing a nested node in the AST.

  • The second element represents metadata about the expression.

  • The third element is a list of arguments for the function call

The output of the above program is given for reference:

iex(1)> quote do
...(1)>   defmodule MyModule do
...(1)>     def hello, do: "World"
...(1)>   end
...(1)> end
# Output:
# {:defmodule, [context: Elixir, import: Kernel],
# [
#   {:__aliases__, [alias: false], [:MyModule]},
#   [
#     do: {:def, [context: Elixir, import: Kernel],
#     [{:hello, [context: Elixir], Elixir}, [do: "World"]]}
#  ]
# ]}

Exploring the AST

Now let’s break down the AST of (5 * 2) - 1 + 7 to understand the example in a better way.

Let’s start from the end of the AST and work our way up.

The root AST node is the + operator, and its arguments are the number 7 combined with another nested node in the tree. We can see that the nested nodes contain our (5 * 2) expression, whose results are applied to the - 1 branch. You also might recall that 5 * 2 in Elixir is just syntactic sugar for Kernel.*(5, 2). This makes our quoted results even easier to decode. The atom :* is the function call to perform, and the metadata tells us that it was imported from Kernel.

The last element [5, 2] is the list of arguments for the Kernel.*/2 function. Entire programs are represented in this way as a simple tree of Elixir tuples.

High level syntax vs. low level AST

To understand the design decisions behind Elixir’s syntax and the AST, we can compare it to other languages where the AST takes center stage. In some languages, such as flavors of Lisp, the source is written directly as an AST, using parentheses to form expressions. If we look closely, we can see how Elixir operates at a layer just above this format.

Lisp

(+ (* 2 3) 1)

Elixir

quote do: 2 * 3 + 1

{:+, _, [{:*, _, [2, 3]}, 1]}

If we compare the Elixir AST with Lisp source code, we can notice that the structure is nearly identical if we replaced brackets with parentheses.

The beauty of Elixir is that the transformation from high-level source to low-level AST requires only a simple quote invocation. With Lisp, you have all the power of a programmable AST at the cost of a less natural and flexible syntax.

Elixir creator José Valim’s revolutionary insight was to separate the syntax from the AST. In Elixir, we get the best of both worlds: a programmable AST with a high-level syntax to perform all the work.

AST literals

When we begin to explore how AST represents the Elixir source, sometimes the results of quoted expressions can be confusing and appear irregular.

We can avoid confusion by realizing that several literals in Elixir have the same representation within the AST and high-level source. This includes atoms, integers, floats, lists, strings, and any two-element tuples containing the former types.

For example, try the following commands in the given terminal. We’ll see that all of the following literals return themselves when quoted:

quote do: :atom
quote do: 123
quote do: [1,2,3]
quote do: "string"
quote do: {:ok,1}
quote do: {:ok,[1,2,3]}
Terminal 1
Terminal
Loading...

If we pass any of the previous examples to a macro, the macro receives the literal arguments instead of an abstract representation.

If we quote other types, we can see that an abstract form is returned. Now try these commands in the terminal:

quote do: %{a: 1, b: 2}
quote do: Enum

Our quoted results show the two different ways. Elixir types are represented in the AST. Some values are passed through as it is, while more complex types are returned as a quoted expression.

When writing macros, it’s helpful to keep AST literals in mind to avoid confusion about whether our arguments are received in abstract form.

Now that we’ve laid a foundation by uncovering the structure of the AST, it’s time to move on to code-generation exercises and apply our new insight. Next, we’ll explore how to transform the AST using Elixir’s macro system.