Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python

How to split a DataFrame according to a boolean criterion

Abhilash

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Overview

A dataframe can be split according to boolean criteria using the method called boolean masking.

Boolean masking or boolean indexing is the process in which subsets of the dataframe are extracted using a boolean vector.

Let’s understand this concept with an example.

DataFrame

Consider the following DataFrame.

import pandas as pd
records = [{"student_name":"Maya Wells","gpa":4.5,"country":"USA"},{"student_name":"Olympia Woods","gpa":5.9,"country":"Australia"},{"student_name":"Kenneth Oneal","gpa":8.5,"country":"Germany"},{"student_name":"Tobias Garcia","gpa":3.0,"country":"Ukraine"},{"student_name":"Micah Mcgee","gpa":9.0,"country":"Austria"},{"student_name":"John Mack","gpa":5.0,"country":"USA"},{"student_name":"Jack Daniels","gpa":6.7,"country":"Australia"},{"student_name":"Sarah Daniels","gpa":1.3,"country":"Australia"},{"student_name":"John Wick","gpa":10.0,"country":"USA"},{"student_name":"Zelensky","gpa":1.0,"country":"Ukraine"},{"student_name":"Jack Som","gpa":8.6,"country":"Austria"}]
df = pd.DataFrame(records)
print(df)
Implementation of DataFrame()

Explanation

  • Line 1: pandas module is imported.
  • Line 3: Sample records for the dataframe is defined.
  • Line 5: A pandas dataframe is created from the sample records.

The dataset is a student dataset that contains student name, their GPA, and the country they belong to.

Now if we want to split the dataset into students belonging to the USA and not belonging to the USA, we can use a boolean mask as follows:

mask = df['country'] == 'USA'

The mask above can be used to get all students from the USA. In order to get all students, not from the USA, we should negate the mask above i.e. ~mask.

Splitting a DataFrame

import pandas as pd
records = [{"student_name":"Maya Wells","gpa":4.5,"country":"USA"},{"student_name":"Olympia Woods","gpa":5.9,"country":"Australia"},{"student_name":"Kenneth Oneal","gpa":8.5,"country":"Germany"},{"student_name":"Tobias Garcia","gpa":3.0,"country":"Ukraine"},{"student_name":"Micah Mcgee","gpa":9.0,"country":"Austria"},{"student_name":"John Mack","gpa":5.0,"country":"USA"},{"student_name":"Jack Daniels","gpa":6.7,"country":"Australia"},{"student_name":"Sarah Daniels","gpa":1.3,"country":"Australia"},{"student_name":"John Wick","gpa":10.0,"country":"USA"},{"student_name":"Zelensky","gpa":1.0,"country":"Ukraine"},{"student_name":"Jack Som","gpa":8.6,"country":"Austria"}]
df = pd.DataFrame(records)
mask = df['country'] == 'USA'
students_from_usa = df[mask]
students_not_from_usa = df[~mask]
print("Students from USA\n", students_from_usa)
print("-"* 5)
print("Students not from USA\n", students_not_from_usa)
DataFrame with mask

Explanation

  • Line 1: pandas module is imported.
  • Line 3: Sample records for the DataFrame is defined.
  • Line 5: A pandas DataFrame is created from the sample records.
  • Line 7: We define the mask as the country column equals USA.
  • Line 9: We get the students from the USA with the help of the mask.
  • Line 11: We get the students not from the USA by negating the mask.

RELATED TAGS

python

CONTRIBUTOR

Abhilash
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring