Notebooks are the main place where you work in Databricks. This is where you write code, run it, view results, and explore data. Databricks notebooks are designed to be simple, interactive, and beginner-friendly, especially in the Free Edition.

Unlike traditional development tools, Databricks notebooks let you write code and see results immediately, which makes learning data engineering and analytics much easier.

Creating a notebook in the Workspace

To create a notebook, go to the "Workspace" tab in the left sidebar. This is where Databricks stores all notebooks and folders. Click the "Create" button and choose "Notebook." Databricks will ask you to give the notebook a name and select a language.

Choose "Python" as the default language. Python is widely used in Databricks, and PySpark works naturally with it.

Once the notebook opens, you will see an empty cell where you can start typing code. Databricks automatically handles compute for you in the Free Edition, so you do not need to attach or configure a cluster

In Databricks Free Edition, notebooks automatically use serverless compute, which means Databricks handles all compute setup for you in the background.

Understanding the notebook layout

When your notebook opens, you will notice a clean and simple layout.

At the top, you see the notebook title and menu options. Below that, the notebook is divided into cells. Each cell can contain code or text. You can run a cell by clicking the "Run" button or using the keyboard shortcut.

The output of a cell appears directly below it, which makes learning very visual and easy to follow.

This layout is one of the reasons Databricks notebooks are popular for learning and experimentation.

Understanding Serverless Compute

In the Free Edition, you do not create or manage clusters manually. When you run a notebook cell, Databricks automatically starts the required compute resources and runs your code. This is called serverless compute. It removes the complexity of cluster configuration and allows beginners to focus on learning the platform instead of infrastructure.

Even though compute is invisible in the Free Edition, Spark still runs your code in a distributed way behind the scenes.

You may notice a short delay the first time you run code. This happens because Databricks is preparing compute resources automatically. Note that classic clusters require some minutes to start because they physically provision cloud VMs from scratch. Serverless eliminates this by drawing from a pre-warmed shared compute pool managed by Databricks, so your code starts in seconds.

Warning: If you stop interacting with the notebook for some time, serverless compute may shut down automatically to save resources. This means variables and data loaded in memory will be lost. You will need to rerun your cells from the beginning.

Running your first code cell

Let’s start with a very simple example to confirm that your notebook is working correctly. In the following example, you will create a small dataset with names and scores, and convert it into a Spark DataFrame.

This example shows how Databricks executes code and displays results interactively.

What this example demonstrates

The code first imports the Row function, which helps create structured records for Spark.
It then creates a small dataset in memory with names and scores.
The spark.createDataFrame() function converts this data into a DataFrame that Databricks can process.
The show() function displays the results in a table directly inside the notebook.

When you run this cell, Databricks automatically uses serverless compute to execute the code.

Even though you only wrote a few lines of code, Apache Spark is still working behind the scenes, preparing the execution engine and handling data processing for you.

Notebook cells and execution basics

As already mentioned, Databricks notebooks are made of cells. Each cell can contain code, text, or visual output. You can run a cell by clicking the "Run" button or using the keyboard shortcut.

Cells run independently, but they share the same sessionAn active connection between your notebook and the compute environment. Variables and data persist within it. while the notebook is active. This allows you to build work step by step without rewriting everything.

Using multiple cells effectively

You do not need to write all the code in one cell. In fact, it is better to break code into small, logical steps.

For example, one cell can load data, another can inspect it, and another can transform it. This makes notebooks easier to read and debug.

This style is commonly used in professional Databricks projects as well.

Cell 1: Create sample data

In this first cell, we create a small dataset in memory. This cell focuses only on data creation.

Explanation:

This cell adds a new column without modifying the original DataFrame.
Each transformation creates a new DataFrame, which keeps your workflow safe.
Keeping transformations separate makes notebooks easier to understand and maintain.

Many production Databricks notebooks follow this same pattern: load data, inspect it, then transform it step by step.

Cell 4: Run cells selectively

You do not have to run the entire notebook every time. You can run only the cell you are working on. For example, if you change the transformation logic, you only need to rerun "Cell 3," not the entire notebook. This saves time and helps you experiment confidently.

Structuring your work this way has three practical advantages that go beyond this lesson:

Understand each step of your workflow clearly. Each cell has a single, focused purpose, making it easy to follow what is happening at every stage.
Fix errors without rerunning everything. If one cell fails, you fix and rerun only that cell while the rest of your work stays intact.
Read notebooks written by other people more easily. Well-structured notebooks are a shared convention in professional Databricks projects, not just tutorials.

This is how real Databricks notebooks are structured in companies, not just in tutorials.

Common beginner questions

Below are some common issues that a beginner may face at the start:

What if a cell fails?

In Databricks notebooks, this is very safe. If a cell fails, Databricks simply shows an error message below it. You can fix the code and run it again without restarting the notebook.

What happens to my data if serverless compute shuts down?

If compute shuts down due to inactivity, your notebook file is saved, but any variables or DataFrames loaded in memory are lost. Simply rerun your cells from Cell 1 to restore your session state.

Can I use languages other than Python?

Yes. Databricks notebooks support Python, SQL, Scala, and R. This course focuses on Python because it is the most widely used language in Databricks workflows.

Errors in notebooks are part of learning. Databricks is designed to make experimentation safe and reversible.

You have now successfully created a Databricks notebook, written basic PySpark code, and run it using serverless compute. This is a major milestone because notebooks are the foundation of everything you will do in Databricks.

1.Introduction to Databricks and Lakehouse

2.Setting Up Databricks

3.PySpark Basics in Databricks

4.Delta Lake Fundamentals

5.SQL in Databricks

6.Mini End-to-End Lakehouse Project

7.Wrap Up and Next Steps

Create and Run Your First Notebook