A dataframe is a commonly used 2-dimensional data structure. It is a table with columns and rows and is mostly used as a pandas object.
Dataframes require the pandas library, as shown below:
import pandas as pd
A dataframe can be formed as shown in the example below, which creates a dataframe that contains countries that have been put in different groups and are given different a_scores and b_scores. Both the scores are imaginary values for the purpose of this example.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})print(df)
where
functionThe where
function allows the replacement of values in rows or columns based on a specified condition.
The function prototype is as follows:
df['new_column'].where(df['new_column'] > 4 , 9)
All values greater than 4 are selected, and the remaining values are replaced by 9.
If the replacement is not provided, the values that fulfill the condition are replaced by NaN
.
where
returns the dataframe with replaced values.
The example below replaces all values less than 5 in the b_score by 20.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]replace = [-1, -2, -3, -4, -5, 1, 2, 3, 4, 5]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score, 'replace': replace})df['replaced'].where(df['replace'] > 0 , 0)