Describing a Series
Explore how to describe a data series using SQL. Learn to count rows, find minimum and maximum values, calculate averages, variance, standard deviation, and robustly determine medians and percentiles using SQL aggregate functions for effective data analysis.
We'll cover the following...
Overview
When we get a fresh data set, the first thing we usually want to do is get familiar with it. Some call this exploratory data analysis (EDA). SQL provides functions to produce descriptive statistics.
A series is a one-dimensional list of values. For example, we can use the VALUES to produce a simple series:
This query uses VALUES to generate a list of numbers: 1, 2, and 3.
Another way of thinking of a series is as a column in a table. For example, if we have a “sales” table, a “sale_date” column can be thought of as a series of timestamps. A “charged_amount” column can be thought of as a series of numbers.
In this lesson, we’ll learn about aggregate functions that can help us get familiar with the data in a series.
Counting rows
The first thing we always want to know is how many rows are there. In SQL, we can answer this question with the aggregate function COUNT:
To count the number of rows in a series, we can use the aggregate function COUNT(*).
Minimum and maximum values
When working on a series of data, it is useful to know the range of the values. For example, if we have a series of dates, we want to know the first and last dates in the series. For a series of ...