Table Talk: Meet Pandas
Learn to use pandas.Series and DataFrames effectively for data analysis workflows.
We'll cover the following...
Let’s be real: raw data is messy. It’s like getting a pile of unsorted LEGO bricks dumped on the desk. Before we can build anything meaningful, like a chart, report, or insight, we need to introduce some level of order to that mess.
That’s where pandas becomes a data analyst’s best friend.
For analysts, it’s the go-to tool for quickly filtering, cleaning, reshaping, and exploring data, all in one place. Whether we’re preparing monthly reports, investigating customer behavior, or validating CSVs from different teams, pandas helps us work smarter and faster.
In this lesson, we’ll work with the two building blocks of pandas: Series and DataFrames. Think of these as the language and workspace where all data cleaning, transformation, and analysis happen.
What is a Series?
A Series is a one-dimensional array-like structure with labels, also called indices, for each data element. We can think of it as a single column of data in a spreadsheet, where each cell has a label attached to it.
import pandas as pddata = [100, 200, 300, 400]s = pd.Series(data)print(s)
This is the simplest way to create a Series: by passing in a list of values. Pandas automatically assigns an integer index to each item. Each item in the list becomes a data point, and pandas creates a default numeric index (0, 1, 2...).
Informational note: By default, pandas assigns an index starting from 0. This is helpful when quickly scanning through unknown data.
Custom index
Adding a custom index gives our data more meaning. Instead of using generic numbers, we can label the data points with names, IDs, or any other identifiers using index
parameter of the Series()
.
import pandas as pds = pd.Series([100, 200, 300, 400], index=['a', 'b', 'c', 'd'])print(s)
We create another Series using the ...