Search⌘ K
AI Features

Grammar of Graphics in R

Explore the grammar of graphics concept fundamental to ggplot2 in R. Understand how data maps to visual elements through layered components and aesthetics. Learn to build and customize meaningful plots by combining data, geometric objects, and statistical transformations in a structured, iterative manner.

What is the grammar of graphics?

First, let’s think about what a graphic is. How do we describe a graphic concisely, and can we build a representation of the description? The answers to these questions can help us understand the concepts involved in the grammar of graphics.

If we look at any graphic, we are likely to find a mapping of certain variables from the dataset to their visual representation in the graphic. The variables or data properties are typically numerical or categorical and are mapped visually as points (x and y coordinates), line colors, different markers, heights of bars, etc.

We know that the grammar for a language is a set of rules describing the correct and acceptable usage of words. These words can be combined in a predefined, logical way, to form meaningful sentences. Similarly, graphics grammar offers principles for arranging mathematical and aesthetic aspects into a meaningful graph, implying that graphics are constructed on an underlying grammar.

There are two important principles here:

  • Different layers of grammatical elements are used to create graphics.
  • Plots are built with appropriate aesthetic mappings to make them plots meaningful.

In short, the grammar states that a statistical graphic is a mapping from data to aesthetic qualities (color, shape, and size) of geometric objects (points, lines, bars). Additionally, the plot might include statistical data transformations depicted in a particular coordinate system.

Building better plots with grammar of graphics

The grammar of graphics in the ggplot package is a plotting framework. An important point to note is that the theoretical basis of the ggplot2 package is the layered grammar of graphics. The layered grammar of graphics proposes the idea of constructing a graphic from multiple layers of data. It varies from Wilkinson’s graphics grammar concept in its component structure and handling of the hierarchy of default values and is contained within another programming language.

This layered grammar adds many enhancements that help it to be more expressive and fit seamlessly into the R environment.

The grammar facilitates iteratively updating a plot by altering a single characteristic at a time. The language is also beneficial since it identifies the high-level components of a plot that may be adjusted, providing us with a framework to think about graphics, and ideally, decreasing the gap between mind and paper. It also encourages using visuals tailored to a specific problem rather than depending on predefined charts.

Once we understand the underlying concepts of grammar of graphics and how the components fit together, we can easily create a variety of visualizations and customize them for our project requirements.

How is a plot built?

So far, we have learned that:

  • Grammar defines a set of rules.
  • Grammar of graphics provides rules for constructing visualizations.
  • The ggplot2 package is based on the grammar of graphics.
  • The ggplot2 package follows a layered approach to describe and build graphics in a structured way.

Now, it’s time to visualize what we have learned. The image below shows how different components, such as grids, axes, data points, and legends, combine perfectly to form a plot. We can visualize the different components added on top of each other to create a plot as shown in steps 1 to 4: