Year-End Discount: 10% OFF 1-year and 20% OFF 2-year subscriptions!

Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python
data structures

Series vs. DataFrame in Pandas

Educative Answers Team

Tired of LeetCode? 😩

Learn the 24 patterns to solve any coding interview question without getting lost in a maze of LeetCode-style practice problems. Practice your skills in a hands-on, setup-free coding environment. πŸ’ͺ

Pandas

Python provides a library called pandas that is popular with data scientists and analysts. Pandas enable users to manipulate and analyze data using sophisticated data analysis tools.

Pandas provide two data structures that shape data into a readable form:

  • Series
  • DataFrame

Series

A pandas series is a one-dimensional data structure that comprises of key-value pair, where keys/labels are the indices and values are the values stored on that index. It is similar to a python dictionary, except it provides more freedom to manipulate and edit the data.

Representation of a series data structure

Syntax

Initializing a series

We use pandas.Series()to initialize a series object using Pandas.

The syntax to initialize different series objects is shown below:

import pandas
##### INTIALIZATION #####
#STRING SERIES
fruits = pandas.Series(["apples", "oranges", "bananas"])
print("Fruit series:")
print(fruits)
#FLOAT SERIES
temperature = pandas.Series([32.6, 34.1, 28.0, 35.9])
print("\nTemperature series:")
print(temperature)
#INTEGER SERIES
factors_of_12 = pandas.Series([1,2,4,6,12])
print("\nFactors of 12 series:")
print(factors_of_12)
print("Type of this data structure is:", type(factors_of_12))
Initializing a Pandas series

In the code example above, there are three different series initialized by providing a list to the pandas.Series() method. Every element in the series has a label/index. By default, the indices are similar to an array index e.g., start with 00 and end at Nβˆ’1N - 1, where NN is the number of elements in that list.

However, we can provide our indices by using the index parameter of the pandas.Series() method.

import pandas
# Integer indices
fruits = pandas.Series(["apples", "oranges", "bananas"], index=[4, 3, 2])
print("Fruit series:")
print(fruits)
# String indices
temperature = pandas.Series([32.6, 34.1, 28.0, 35.9], index=["one", "two", "three", "four"])
print("\nTemperature series:")
print(temperature)
# Non-unique index values
factors_of_12 = pandas.Series([1,2,4,6,12], index=[1, 1, 2, 2, 3])
print("\nFactors of 12 series:")
print(factors_of_12)
print("Type of this data structure is:", type(factors_of_12))
Initializing a Pandas series using user-provided indices

We can have indices with hashable data types e.g., integers and strings. Index values don't have to be unique (shown in the above code example).

Moreover, you can name your series by passing a string to the name argument in the pandas.Series() method:

import pandas
fruit = pandas.Series(["apples", "oranges", "bananas"], name = "fruit_series")
print("Fruit series:")
print(fruit)
Initializing a Pandas series and naming it using the name parameter

We can also initialize our series with a python dictionary using the following syntax:

import pandas
data = {'a': 25, 'bb': 30, 'c': 50, 'za': 21, 2: 200}
fruit = pandas.Series(data)
print("Series:")
print(fruit)
Initializing a Pandas series with a Python dictionary.

Querying a series object

To query a series using the default/built-in labels, we use .iloc[] method or the bracket operator []. To query using the user-defined labels/indices we use .loc[] method.

import pandas
fruits = pandas.Series(["apples", "oranges", "bananas"], index=['a', 'b', 'c'])
print("Fruit series:")
print(fruits)
##### ACCESSING DATA #####
#Using .iloc
print ("\n2nd fruit using .iloc[]: ", fruits.iloc[1])
#Using index
print ("\n2nd fruit using default/built-in index: ", fruits[1])
#Using loc
print ("\nFruit at key \"b\" using .loc[]: ", fruits.loc['b'])
Accessing data in Pandas series using .loc .iloc and [] operator

Note: Pandas series provides a vast range of functionality. To dig deeper into the different series methods, visit the official [documentation].

DataFrame

A pandas DataFrame is a two-dimensional data structure that can be thought of as a spreadsheet. It can also be thought of as a collection of two or more series with common indices.

Representation of a pandas DataFrame.

Syntax

Initializing a DataFrame

To initialize a DataFrame, use pandas.DataFrame():

import pandas as pd
##### INITIALIZATION #####
fruits_jack = ["apples", "oranges", "bananas"]
fruits_john = ["guavas", "kiwis", "strawberries"]
index = ["a", "b", "c"]
all_fruits = {"Jack's": fruits_jack, "John's": fruits_john}
fruits_default_index = pd.DataFrame(all_fruits)
print("Dataframe with default indices:\n", fruits_default_index, "\n")
new_fruits = pd.DataFrame(all_fruits, index = index)
print("Dataframe with given indices:\n", new_fruits, "\n")
Initializing a DataFrame

In the code example above, a DataFrame is initialized using a dictionary with two key-value pairs. Every key in this dictionary represents a column in the resulting DataFrame and the value represents all the elements in this column.

Both of the lists comprising of fruits as values are used to make a Python dictionary which is then passed to the pandas.DataFrame() method to make a DataFrame.

For the second DataFrame, we passed a list of indexes using the index argument in the pandas.DataFrame() method to use our custom indices.

Querying a DataFrame

The DataFrame can be queried in multiple ways.

  • .loc[] can be used to query the DataFrame using the user-defined indexes.
  • .iloc[] can be used to query using the default/built-in indexes.
  • Bracket operator [] can be used to query specific indices or columns.

We can also use chained queries to query a specific cell in the DataFrame. These queries return a series or a single object depending on the type of query. Querying a row or a column returns series while querying a cell returns an object.

import pandas as pd
##### INITIALIZATION #####
fruits_jack = ["apples", "oranges", "bananas"]
fruits_john = ["guavas", "kiwis", "strawberries"]
index = ["a", "b", "c"]
all_fruits = {"Jack's": fruits_jack, "John's": fruits_john}
fruits = pd.DataFrame(all_fruits, index = index)
print(fruits, "\n")
new_fruits = pd.DataFrame(all_fruits)
print(new_fruits, "\n")
##### QUERY #####
#USING INDEX
print("1st fruit:")
print(fruits.iloc[0], "\n")
#USING KEY
print("Fruits at key \"c\":")
print(fruits.loc["c"], "\n")
#USING COLUMN NAME
print("Jack's fruits: ")
print(fruits["Jack's"], "\n")
#CHAINED QUERY, querying a cell
print("Johns third fruit: ")
print(fruits["John's"][2], "\n")

Note: The pandas DataFrame equips you with numerous tools to manipulate and analyze large amounts of data. To dig deeper into the different DataFrame methods, visit the official documentation.

RELATED TAGS

python
data structures

Tired of LeetCode? 😩

Learn the 24 patterns to solve any coding interview question without getting lost in a maze of LeetCode-style practice problems. Practice your skills in a hands-on, setup-free coding environment. πŸ’ͺ

Keep Exploring
Related Courses