What is Pandas DataFrame.where() in Python?
Overview
The DataFrame.where() method in Python replaces values in a DataFrame where a specified condition is false. By default, it replaces empty fields with NaN values.
Note: Pandas DataFrame is a two-dimensional labeled data structure.
Syntax
# Signature according to Pandas documentationDataFrame.where(condition,other=NoDefault.no_default,inplace=False,axis=None,level=None,errors='raise',try_cast=NoDefault.no_default)
Parameter
This method takes the following argument values:
condition: Boolean- It is the condition to check DataFrame for. It can be single or multiple.
other: Series, Scalar, DataFrame, or callable- It represents the type of entries to replace where the
conditiongets false. inplace: Boolean, default=False- It checks whether to form operation on the same data or its copy.
axis: Integer, default=None- It checks for rows or columns.
level: Integer, default=NoneLevel alignment Rows and Column alignment errors: String, default=raiseraise: It allows this method to raise exceptions.ignore: It suppresses exceptions.try_cast: Boolean, default=None- It cast/changes results back into the input type.
Return value
This method returns the same type as caller, or None when inplace gets True in other cases.
Code
Let's look at how we can create a DataFrame and filter databases on specified conditions using where().
Single condition operation
# Importing the Pandas packageimport pandas as pd# Nested lists of datadata= [['Julia','Grade 10',78],['Butller','Grade 12',90],['Monitosh','Grade 11',88],['Butller','Grade 5',95],['vyohi','Grade 7',72]]# Creating a DataFramedf = pd.DataFrame(data, columns=['Name', 'Class','Marks'])# Creating Boolean series for Butller name_filter = df["Name"]=="Butller"# Filtering the extracted dataresults= df.where(_filter, inplace = False)# Showing the data on the consoleprint(results)
Explanation
- Lines 4–8: We create a nested list of five observations to convert into a DataFrame.
- Line 10: We invoke the
DataFrame()method from the Pandas package to convert this nested list into a DataFrame ofName,Class, andMarks. - Line 12: We create a Boolean series of students with the name
Butller.
- Lines 14–16: We call the
df.where()function to filter the studentButtler's data.
Code
Here, the code is the same as above other than the filtering condition. Instead of one, we can also use multiple conditions using logical operators.
Multiple conditions operation
# Importing the Pandas packageimport pandas as pd# Nested lists of datadata= [['Julia','Grade 10',78],['Butller','Grade 12',90],['Monitosh','Grade 11',88],['Butller','Grade 5',95],['vyohi','Grade 7',72]]# Creating a DataFramedf = pd.DataFrame(data, columns=['Name', 'Class','Marks'])# Creating a Boolean series for the name Butller_filter1 = df["Name"]=="Butller"_filter2 = df["Class"]=="Grade 5"# Filtering the extracted dataresults= df.where(_filter1 & _filter2, inplace = False)# Showing data on the consoleprint(results)
Explanation
- Line 12: We create a Boolean series of students with the name
Butller. - Line 13: We create a Boolean series of students who study in
Grade 5.
- Lines 15–16: We filter out the student named
Buttlerwho studies inGrade 5.