Search⌘ K

Introduction to Data Transformation

Explore essential data transformation operations in both PySpark and pandas to enhance data processing skills. Understand how to aggregate data, compute statistical summaries, work with date and time, and perform SQL-like joins and pivots. Gain practical insights using PySpark's functions module applied to real datasets, preparing you to handle common tasks confidently.

We'll cover the following...

Overview

PySpark and pandas’ native API provides almost all the commonly used data transformation techniques as functions or methods. Because this API’s list of objects, functions, and methods is so extensive, we’ll only explore a few ...