Create and Run Your First Notebook
Explore how to create your first Databricks notebook, write and run Python code with PySpark, and understand the serverless compute environment. This lesson guides you through building, inspecting, and transforming data step-by-step using notebook cells, setting a foundation for working efficiently in Databricks.
Notebooks are the main place where you work in Databricks. This is where you write code, run it, view results, and explore data. Databricks notebooks are designed to be simple, interactive, and beginner-friendly, especially in the Free Edition.
Unlike traditional development tools, Databricks notebooks let you write code and see results immediately, which makes learning data engineering and analytics much easier.
Creating a notebook in the Workspace
To create a notebook, go to the "Workspace" tab in the left sidebar. This is where Databricks stores all notebooks and folders. Click the "Create" button and choose "Notebook." Databricks will ask you to give the notebook a name and select a language.
Choose "Python" as the default language. Python is widely used in Databricks, and PySpark works naturally with it.
Once the notebook opens, you will see an empty cell where you can start typing code. Databricks automatically handles compute for you in the Free Edition, so you do not need to attach or configure a cluster
In Databricks Free Edition, notebooks automatically use serverless compute, which means Databricks handles all compute setup for you in the background.
Understanding the notebook layout
When your notebook opens, you will notice a clean and simple layout.
At the top, you see the notebook title and menu options. Below that, the notebook is divided into cells. Each cell can contain code or text. You can run a cell by clicking the "Run" button or using the keyboard shortcut.
The output of a cell appears directly below it, which makes learning very visual and easy to follow.
This layout is one of the reasons Databricks notebooks are popular for learning and experimentation.
Understanding Serverless Compute
In the Free Edition, you do not create or manage clusters manually. When you run a notebook cell, Databricks automatically starts the required compute resources and runs your code. This is called serverless compute. It removes the complexity of cluster configuration and allows beginners to focus on learning the platform instead of infrastructure.
Even though compute is invisible in the Free Edition, Spark still runs your code in a distributed way behind the scenes.
You may notice a short delay the first time you run code. This happens because Databricks is preparing compute resources automatically. Note that classic clusters require some minutes to start because they physically provision cloud VMs from scratch. Serverless eliminates this by drawing from a pre-warmed shared compute pool managed by Databricks, so your code starts in seconds.
Warning: If you stop interacting with the notebook for some time, serverless compute may shut down automatically to save resources. This means variables and data loaded in memory will be lost. You will need to rerun your cells from the beginning.
Running your first code cell
Let’s start with a very simple example to confirm that your notebook is working correctly. In the following example, you will create a small dataset with names and scores, and convert it into a Spark DataFrame.
This example shows how Databricks executes code and displays results interactively.
What this example demonstrates
The code first imports the
Rowfunction, which helps create structured records for Spark.It then creates a small dataset in memory with names and scores.
The
spark.createDataFrame()function converts this data into a DataFrame that Databricks can process.The
show()function displays the results in a table directly inside the notebook.
When you run this cell, Databricks automatically uses serverless compute to execute the code.
Even though you only wrote a few lines of code, Apache Spark is still working behind the scenes, preparing the execution engine and handling data processing for you.
Notebook cells and execution basics
As already mentioned, Databricks notebooks are made of cells. Each cell can contain code, text, or visual output. You can run a cell by clicking the "Run" button or using the keyboard shortcut.
Cells run independently, but they share the same
Using multiple cells effectively
You do not need to write all the code in one cell. In fact, it is better to break code into small, logical steps.
For example, one cell can load data, another can inspect it, and another can transform it. This makes notebooks easier to read and debug.
This style is commonly used in professional Databricks projects as well.
Cell 1: Create sample data
In this first cell, we create a small dataset in memory. This cell focuses only on data creation.
Explanation:
This cell creates structured sales data using named fields, so columns are clearly defined.
The DataFrame is created using Spark, which Databricks provides automatically.
No output is shown yet because this cell is only preparing data.
Cell 2: Inspect the data
In the second cell, we focus on checking the data, not creating it again.
Explanation:
The
show()function displays the rows in a table format inside the notebook.The
printSchema()function shows column names and data types.Separating inspection into its own cell makes debugging easier if something looks wrong.
You can rerun this cell as many times as you want without recreating the data again.
Cell 3: Apply a simple transformation
In this cell, we perform a small transformation by adding a new column. This cell focuses only on changing the data.
Explanation:
This cell adds a new column without modifying the original DataFrame.
Each transformation creates a new DataFrame, which keeps your workflow safe.
Keeping transformations separate makes notebooks easier to understand and maintain.
Many production Databricks notebooks follow this same pattern: load data, inspect it, then transform it step by step.
Cell 4: Run cells selectively
You do not have to run the entire notebook every time. You can run only the cell you are working on. For example, if you change the transformation logic, you only need to rerun "Cell 3," not the entire notebook. This saves time and helps you experiment confidently.
Structuring your work this way has three practical advantages that go beyond this lesson:
Understand each step of your workflow clearly. Each cell has a single, focused purpose, making it easy to follow what is happening at every stage.
Fix errors without rerunning everything. If one cell fails, you fix and rerun only that cell while the rest of your work stays intact.
Read notebooks written by other people more easily. Well-structured notebooks are a shared convention in professional Databricks projects, not just tutorials.
This is how real Databricks notebooks are structured in companies, not just in tutorials.
Common beginner questions
Below are some common issues that a beginner may face at the start:
What if a cell fails?
In Databricks notebooks, this is very safe. If a cell fails, Databricks simply shows an error message below it. You can fix the code and run it again without restarting the notebook.
What happens to my data if serverless compute shuts down?
If compute shuts down due to inactivity, your notebook file is saved, but any variables or DataFrames loaded in memory are lost. Simply rerun your cells from Cell 1 to restore your session state.
Can I use languages other than Python?
Yes. Databricks notebooks support Python, SQL, Scala, and R. This course focuses on Python because it is the most widely used language in Databricks workflows.
Errors in notebooks are part of learning. Databricks is designed to make experimentation safe and reversible.
You have now successfully created a Databricks notebook, written basic PySpark code, and run it using serverless compute. This is a major milestone because notebooks are the foundation of everything you will do in Databricks.
Databricks notebook
A feature that automatically provides compute resources without manual cluster setup.
Serverless compute
The main environment where code is written, executed, and results are displayed.
Cell execution
The section where notebooks and folders are organized.
Workspace
The process of running a single block of code inside a notebook.