Introducing Altair

Vega-Altair, or Altair for short, is a declarative Python library for statistical visualization. It relies on the Vega and Vega-Lite visualization grammars, which describe the visual appearance and interactivity of visualizations in JSON.

Declarative vs. imperative libraries

There are two types of visualization libraries:

  • Imperative libraries focus on how to build a visualization, such as manually specifying the steps to build the visualization (axis, size, legend, and labels). Matplotlib is an example of an imperative library.

  • Declarative libraries focus on what we want to see. We specify the data and type of visualization we want to see. The library will do the manipulations to create the visualization for us automatically. Altair is an example of a declarative library.

The following table shows the differences between imperative and declarative libraries.

Imperative vs. Declarative Libraries

Imperative Library

Declarative Library

Specifies explicit instructions to build the visualization

Describes the output

Only provides the tools, we perform steps manually

Performs everything automatically

Main elements of an Altair chart

Every Altair chart comprises three main elements: the data, the mark, and the encoding.

The Chart object

A Chart object is the entry point element in Altair. Every Altair chart receives as input a single argument, that is, the dataset.

To start creating a chart, import the Altair library and then create the chart.

Press + to interact
import altair as alt
alt.Chart(dataset)

The dataset can be in one of the following formats:

  • pandas DataFrame

  • Data or related object (i.e., UrlData, InlineData, NamedData)

  • URL pointing to a JSON or CSV file

  • Object supporting the geo_interface (e.g., GeoPandas, GeoDataFrame, and so on)

The mark property

A mark is a Chart property that defines how to represent data. Examples of marks include bar charts, line charts, area charts, and many more. To specify a mark, append it to the Altair Chart.

Press + to interact
import altair as alt
alt.Chart(dataset).mark_bar()

The example tells Altair to draw a bar chart. In general, the name of each mark is mark_<type_of_graph>(). The following table shows the most famous mark charts in Altair:

Common Marks in Altair

Name

Description

mark_bar()

A bar chart

mark_line()

A line chart

mark_point()

A scatter plot with configurable point shapes

mark_circle()

A scatter plot with filled circles

Encodings

Encodings specify where to represent data, including its position, size, color, and more. To define an encoding, we append the encode() property to the Chart.

Press + to interact
import altair as alt
alt.Chart(dataset).mark_bar().encode()

Example

Consider the Christmas trees dataset.

Christmas Tree Dataset

Year

RealTree

FakeTree

2004

27100000

9000000

...

...

...

2016

27400000

18600000

We load it as a pandas DataFrame and then draw a simple line chart in altair of the RealTree column versus the Year column.

Press + to interact
import altair as alt
import pandas as pd
import os
df = pd.read_csv('/data/christmas_trees.csv')
chart = alt.Chart(df).mark_line().encode(
x = 'Year:O', # O for ordinal data
y = 'RealTree:Q' # Q for quantitative data
)
chart.save('chart.html')
os.system('cat chart.html')

The example uses the mark_line() property to draw a line and specifies the x and y axes in the encode() property. For each column, we must also select the type (O for Year and Q for RealTree). We use the last two statements of the code to render the chart.

Click the “Run” button to see the produced chart.

Let’s practice!

Now, we’ll play with the previous snippet of code:

  • We’ll change the mark property to mark_bar() or mark_point().

  • We’ll represent the FakeTree column in the y-axis instead of the RealTree one.

Drawing multiple lines

The previous chart draws only a single line, representing a single dataset column. There are two strategies to show multiple lines.

The first strategy uses Altair’s concept of layer, which overlaps different graphs. We’ll build a base chart with the basic encoding for the x-axis, and then use the layer() property to draw two separate lines.

Press + to interact
base = alt.Chart(df).encode(x='Year:O')
chart = alt.layer(
base.mark_line(color='blue').encode(y='FakeTree:Q'),
base.mark_line(color='red').encode(y='RealTree:Q')
)

The second strategy transforms the original pandas DataFrame through the melt() method from wide to long and then plots the melted DataFrame in Altair.

The following figure shows how melt('Year') works.

The melt operation
The melt operation

The melt() method receives the column to not compact (Year) as input and combines the remaining columns into a single column named value.

Press + to interact
import altair as alt
import pandas as pd
import os
df = pd.read_csv('/data/christmas_trees.csv')
data = df.melt('Year')
chart = alt.Chart(data).mark_line().encode(
x='Year',
y='value',
color='variable'
)
chart.save('chart.html')
os.system('cat chart.html')

This example adds a new attribute to encoding, called color, which specifies how to render a color. The example uses different colors depending on the value of the variable column.