Search⌘ K

- Databricks Community Edition

Explore how to set up and configure Databricks Community Edition to build scalable PySpark batch pipelines. Learn to create clusters, install essential libraries such as BigQuery connector and deep learning packages, and run notebooks for distributed data processing and model building.

One of the quickest ways to get up and running with PySpark is to use a hosted notebook environment.

Databricks is the largest Spark vendor and provides a free version for getting started called Community Edition. We’ll use this environment to get started with Spark and build AWS and GCP model pipelines.

Getting started

The first step is to create a login on the Databricks website for the community edition. Next, perform the following steps to spin up a test cluster after logging in:

  1. Click on “Clusters” on the left navigation bar.
  2. Click “Create Cluster”.
  3. Assign a name, “DSP”.
  4. Select
...