author-cover
INTERACTIVE COURSE

From Pandas to PySpark DataFrame

Intermediate

39 Lessons

3h 3min

Certificate of Completion

author-cover
54 Playgrounds
27 Illustrations

Takeaway Skills

A working knowledge of Apache Spark and the PySpark library for Python

A strong understanding of the advantages of using PySpark instead of Pandas for processing large datasets

The ability to calculate some Metrics or produce aggregated analytics reporting solutions

The ability to write Production Code in PySpark

Course Overview

Pandas is a popular Python library used to manipulate data, but it has certain limitations in its ability to process large datasets. The Apache Spark analytics library offers significant performance improvements. This course will help improve your Python-based data...Show More

Course Content

1

Introduction

2 Lessons

2

Data Input/Output

10 Lessons

Show all 10 lessons
3

Data Transformation

16 Lessons

4

User Defined Function (UDF)

8 Lessons

5

Wrapping Up

1 Lesson

6

Appendix

2 Lessons

COURSE AUTHOR

How You'll Learn

Hands-on Coding Environments

You don’t get better at swimming by watching others. Coding is no different. Practice as you learn with live code environments inside your browser.

2x Faster Than Videos

Videos are holding you back. The average video tutorial is spoken at 150 words per minute, while you can read at 250. That‘s why our courses are text-based.

No Set-up Required

Start learning immediately instead of fiddling with SDKs and IDEs. It‘s all on the cloud.

Progress You Can Show

Built-in assessments let you test your skills. Completion certificates let you show them off.