Search⌘ K
AI Features

Querying Delta Tables with SQL

Explore how to query Delta tables using SQL within Databricks notebooks. Learn to retrieve, filter, aggregate, group, and sort data with SQL commands. Understand the use of the %sql magic command and how to run SQL queries through Python for flexible data analysis. This lesson guides you to efficiently analyze data stored in Delta Lake using familiar SQL methods.

In the previous lesson, we created a Delta table called sales with two columns: name (the employee's name) and sales (the amount sold). Now we will learn how to query that same table using SQL. SQL is widely used by data analysts and data engineers to explore and analyze datasets, and Databricks makes it easy to run SQL directly inside a notebook using a special command called SQL magic.

Why SQL is useful in Databricks

SQL is one of the most common languages used in data analytics. Even when data is stored using Spark or Delta Lake, SQL is often the preferred tool for exploring and analyzing it because its syntax is close to plain English and its results are easy to read.

In Databricks, SQL queries run directly on Delta tables. This means you can analyze large datasets using simple, familiar commands without writing any PySpark code.

Some common SQL tasks include:

  • Viewing the entire dataset

  • Filtering specific rows

  • Aggregating data

  • Grouping values

  • Performing calculations

We will cover each of these in this lesson, using the sales table as our working example.

The sales table must already exist in your workspace from the previous lesson. If the table is missing, you will need to rerun the code that created it.

Running SQL inside a Databricks notebook

Databricks notebooks support multiple languages: Python, SQL, Scala, and R. When you are working in a Python notebook, you can still run SQL by starting a cell with the %sql magic command. A magic command is a special instruction that begins with % and changes how a single notebook cell behaves. In this case, %sql tells Databricks to interpret everything in that cell as SQL instead of Python.

%sql must be the very first line of the cell. If you put anything above it, including a comment, the magic command will not work.

Let's run our first SQL query to display the entire sales table. Before running the SQL query, open a ...