...

/

Table Talk: Meet pandas

Table Talk: Meet pandas

Learn to master pandas Series and DataFrames for data science workflows.

Raw data rarely comes in a form that’s easy to analyze. It’s messy, inconsistent, and often unstructured—full of missing values, mixed types, and confusing layouts. To make sense of this chaos, we must first convert it into a format that computers and humans can easily work with.

That’s where pandas come in. By converting raw data into structured objects like Series and DataFrames, pandas allows us to organize, clean, and manipulate data efficiently. This transformation is the first step to extracting insights and answering complex questions.

In this lesson, we’ll dive into the fundamental building blocks of pandas: Series and DataFrames. Think of these as the language and workspace where all data cleaning, transformation, and analysis happen.

What is a Series?

Series is a one-dimensional array-like structure with labels, also called indexes, for each data element. We can think of it as a single column of data in a spreadsheet, where each cell has a label attached to it.

Press + to interact
Python 3.10.4
import pandas as pd
data = [100, 200, 300, 400]
s = pd.Series(data)
print(s)

This is the simplest way to create a Series: by passing in a list of values. Pandas automatically assigns an integer index to each item. Each item in the list becomes a data point, and pandas creates a default numeric index (0, 1, 2...).

Custom index

Adding a custom index gives our data more meaning. Instead of using generic numbers, we can label the data points with names, IDs, or any other identifiers using index parameter of the Series().

Press + to interact
Python 3.10.4
import pandas as pd
s = pd.Series([100, 200, 300, 400], index=['a', 'b', 'c', 'd'])
print(s)

We create another Series using the same data list, but this time we specify custom labels for the index using the index parameter. Instead of default numerical indices, we use the labels 'a''b''c', and 'd' for each corresponding value. The output now displays the values paired with these new labels, making it more meaningful or human-readable ...