A PySpark Primer
Explore PySpark fundamentals for building scalable batch model pipelines. Understand Spark dataframes, their lazy execution, and common operations such as data persistence and transformation. Gain practical experience using the NHL stats dataset in cloud-based environments to handle large-scale data processing.
We'll cover the following...
We'll cover the following...
What is PySpark?
PySpark is a powerful language for both exploratory analysis and building machine learning pipelines. The core data type in PySpark is the Spark dataframe, which is similar to Pandas dataframes but is designed to execute in ...