Pandas Cheat Sheet

Discover how to effectively use the pandas library to manipulate, clean, and analyze data in Python. Learn key concepts including data structures, file operations, selection, modifying data, handling missing values, aggregations, and combining datasets. Gain practical skills to preprocess and prepare data for machine learning workflows.

We'll cover the following...

Getting started with pandas
Reading and writing data
Selecting and accessing data
Modifying DataFrames
Sorting and ranking
Descriptive information
Handling missing values
Aggregations and mathematical operations
Combining and merging data
Apply a function to a dataset

Pandas is one of the most popular tools for data analysis in Python. This open-source library is the backbone of many data projects and is used for data cleaning and data manipulation.

With pandas, you gain greater control over complex datasets. It’s an essential tool in the data analysis tool belt. If you’re not using pandas, you’re not making the most of your data.

In this lesson, we’ll explore a quick guide to the most essential operations and commands.

Getting started with pandas

Pandas is a powerful Python library that makes data manipulation, analysis, and preprocessing intuitive and efficient. It provides flexible data structures such as Series and DataFrames, which are essential for handling structured data in Python.

Pandas import convention

Import the pandas library using the conventional alias pd for data manipulation.

DataFrames are the core data structure for preprocessing features, analyzing patterns, and feeding data to ML models.

Reading and writing data

Pandas supports a wide range of file formats, making it versatile for different data sources. Learning how to efficiently read and write data ensures our workflows are reproducible and ready for analysis or modeling.

Read and write to CSV file

CSV files are commonly used to store datasets. Efficiently reading and writing CSVs allows smooth preprocessing, experimentation, and sharing of data.

We can specify a column to use as the DataFrame index when reading a CSV. The first column is numbered 0, but we can also use column names. If no index is specified, pandas will create a default numeric index starting from 0.

import pandas as pd
# Read CSV using full file path
data = pd.read_csv('/path/to/file.csv')  # Specify full path to file
# Read CSV without specifying an index (default numeric index will be created)
data = pd.read_csv('file.csv')          # File is in current working directory
# Read CSV and use the first column as the index
data = pd.read_csv('data.csv', index_col=0)
# Read CSV and use a column by name as the index
data = pd.read_csv('data.csv', index_col='ID')
# Write DataFrame to CSV (exclude index if not needed)
data.to_csv('output.csv', index=False)

Read and write CSV files with optional indexing

index_col=0: Uses the first column as the index.
index_col='ID': Uses the column named 'ID' as the index.
If no index is specified, pandas generates a default numeric index 0, 1, 2….

This ensures our data is loaded with the correct indexing for downstream analysis and can be saved back to CSV efficiently.

Read and write to an Excel file

Call the read_excel function to access an Excel file. Pass the name of the Excel file as an argument. Use to_excel to save a DataFrame back to an Excel file, optionally specifying the sheet name. ...

1.What you'll learn from this course

2.Data Manipulation with NumPy

3.Data Analysis with pandas

4.Data Preprocessing with scikit-learn

5.Data Modeling with scikit-learn

6.Clustering with scikit-learn

7.Gradient Boosting with XGBoost

8.Deep Learning with TensorFlow

9.Deep Learning with Keras

Mock Interview