Search⌘ K
AI Features

Table Talk: Meet Pandas

Explore the essential pandas library by learning to create and manipulate Series and DataFrames. Understand how to access, filter, and transform data to prepare it for analysis. This lesson empowers you to handle messy datasets quickly, perform vectorized operations, and add or update columns, making your data analysis workflow more effective.

Let’s be real: raw data is messy. It’s like getting a pile of unsorted LEGO bricks dumped on the desk. Before we can build anything meaningful, like a chart, report, or insight, we need to introduce some level of order to that mess.

That’s where pandas becomes a data analyst’s best friend.

For analysts, it’s the go-to tool for quickly filtering, cleaning, reshaping, and exploring data, all in one place. Whether we’re preparing monthly reports, investigating customer behavior, or validating CSVs from different teams, pandas helps us work smarter and faster.

In this lesson, we’ll work with the two building blocks of pandas: Series and DataFrames. Think of these as the language and workspace where all data cleaning, transformation, and analysis happen.

Series vs. DataFrame
Series vs. DataFrame

What is a Series?

A Series is a one-dimensional array-like structure with labels, also called indices, for each data element. We can think of it as a single column of data in a spreadsheet, where each cell has a label attached to it.

Python 3.10.4
import pandas as pd
data = [100, 200, 300, 400]
s = pd.Series(data)
print(s)

This is the simplest way to create a Series: by passing in a list of values. Pandas automatically assigns an integer index to each item. Each item in the list becomes a data point, and pandas creates a default numeric index (0, 1, 2...).

Informational note: By default, pandas assigns an index starting from 0. This is helpful when quickly scanning through unknown data.

Custom index

Adding a custom index gives our data more meaning. Instead of using generic numbers, we can label the data points with names, IDs, or any other identifiers using index parameter ...