Trusted answers to developer questions

How to use the where function on a dataframe in pandas

Free System Design Interview Course

Many candidates are rejected or down-leveled due to poor performance in their System Design Interview. Stand out in System Design Interviews and get hired in 2024 with this popular free course.

What is a dataframe?

A dataframe is a commonly used 2-dimensional data structure. It is a table with columns and rows and is mostly used as a pandas object.

A sample dataframe

Dataframes require the pandas library, as shown below:

import pandas as pd

A dataframe can be formed as shown in the example below, which creates a dataframe that contains countries that have been put in different groups and are given different a_scores and b_scores. Both the scores are imaginary values for the purpose of this example.

import pandas as pd
a_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]
b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]
country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']
groups = ['A','A','B','A','B','B','C','A','C','C']
df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})
print(df)

The where function

The where function allows the replacement of values in rows or columns based on a specified condition.

Prototype

The function prototype is as follows:

df['new_column'].where(df['new_column'] > 4 , 9)

All values greater than 4 are selected, and the remaining values are replaced by 9.

Parameters

  • The condition
  • The replacement value

If the replacement is not provided, the values that fulfill the condition are replaced by NaN.

Return value

where returns the dataframe with replaced values.

Code

The example below replaces all values less than 5 in the b_score by 20.

import pandas as pd
a_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]
b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]
replace = [-1, -2, -3, -4, -5, 1, 2, 3, 4, 5]
country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']
groups = ['A','A','B','A','B','B','C','A','C','C']
df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score, 'replace': replace})
df['replaced'].where(df['replace'] > 0 , 0)

RELATED TAGS

pandas
python
Did you find this helpful?