How to use the where function on a dataframe in pandas
What is a dataframe?
A dataframe is a commonly used 2-dimensional data structure. It is a table with columns and rows and is mostly used as a pandas object.
Dataframes require the pandas library, as shown below:
import pandas as pd
A dataframe can be formed as shown in the example below, which creates a dataframe that contains countries that have been put in different groups and are given different a_scores and b_scores. Both the scores are imaginary values for the purpose of this example.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})print(df)
The where function
The where function allows the replacement of values in rows or columns based on a specified condition.
Prototype
The function prototype is as follows:
df['new_column'].where(df['new_column'] > 4 , 9)
All values greater than 4 are selected, and the remaining values are replaced by 9.
Parameters
- The condition
- The replacement value
If the replacement is not provided, the values that fulfill the condition are replaced by NaN.
Return value
where returns the dataframe with replaced values.
Code
The example below replaces all values less than 5 in the b_score by 20.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]replace = [-1, -2, -3, -4, -5, 1, 2, 3, 4, 5]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score, 'replace': replace})df['replaced'].where(df['replace'] > 0 , 0)