How to filter pandas DataFrame by column value
Pandas is an open-source Python library mostly used in data sciences, data analysis, and machine learning projects. In machine learning tasks, it helps clean and prepare the data to pass them into machine learning models. The most common operations performed using pandas are filling in missing values, manipulating the data, loading data from files, and visualizing data.
In this Answer, we will explore how to select a DataFrame with specific values by applying a filter on the column values.
Filtering methods
We can filter DataFrame by column values using the following methods:
Boolean indexing: We can create a condition (Boolean expression). Thee DataFrame will then keep the rows where the condition was true and discard the rows where it was false.
filtered_df = df[df['Age'] > 18]
qeury()method: We can pass a query in string format in thequerymethod. It is mainly used for complex filtering.
filtered_df = df.query('Age > 25 and Country == "USA"')
loc[]accessor: We can access the column values of a DataFrame directly by applying conditions in theloc[]accessor.
filtered_df = df.loc[df['Age'] > 25]
isin()method: We can filter rows of a DataFrame based on whether the values in a specified column are present in a given list or array.
ages = [25, 30, 35]filtered_df = df[df['Age'].isin(ages)]
Example
To learn how we can apply a filter on the column values, let's first create a data example. Consider that we have the following 2-D data present in a pandas DataFrame. The data contains the person's name, country, and age:
Name | Country | Age |
Albert | USA | 12 |
James | USA | 14 |
Alex | USA | 20 |
Bob | Canada | 25 |
Sara | Canada | 29 |
Bill | Germanny | 45 |
Daniel | Germany | 32 |
John | Egypt | 60 |
In this example, we aim to infer two separate pieces of information from the data:
1. Information of persons living in The USA.
The DataFrame will provide us with the following person's information if we find the person living in the USA:
Name | Country | Age |
Albert | USA | 12 |
James | USA | 14 |
Alex | USA | 20 |
2. Information of persons older than 18 but younger than 40.
The DataFrame will provide us with the following person's information if we find the person older than 18 and younger than 40:
Name | Country | Age |
Alex | USA | 20 |
Bob | Canada | 25 |
Sara | Canada | 29 |
Daniel | Germany | 32 |
In the coming sections, we will load the example data in a pandas DataFrame and apply the filters to the data.
Coding example
First of all, let's create a pandas DataFrame and load it with some hard-coded data:
import pandas as pd
person_data = pd.DataFrame({
"Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"],
"Country": ["USA", "USA", "USA", "canada", "Canada", "Germany", "Germany", "Egypt"] ,
"Age": [12, 14, 20, 25, 29, 45, 32, 60]
})
print(person_data)Code explanation
Line 1: We import
pandasand give it the namepd.Lines 3–7: We create a DataFrame using
pd.DataFramefunction, store it in theperson_datavariable, and pass the person's information in the curly brackets{}.
Now that we have created the DataFrame, we can move towards applying filters on column values.
import pandas as pd
person_data = pd.DataFrame({
"Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"],
"Country": ["USA", "USA", "USA", "canada", "Canada", "Germany", "Germany", "Egypt"] ,
"Age": [12, 14, 20, 25, 29, 45, 32, 60]
})
age = [25,60]
age_greater_than_25 = person_data.loc[person_data['Age'] > 25]
germany_and_older_than_25 = person_data.query('Age > 25 and Country == "Germany"')
age_25_and_60 = person_data[person_data["Age"].isin(age)]
living_in_USA = person_data[person_data["Country"] == "USA"]
age_filter = person_data[(person_data["Age"] > 18) & (person_data["Age"] < 40)]
print("Age greater than 25:")
print(age_greater_than_25)
print("Germany citizen and age greater than 25:")
print(germany_and_older_than_25)
print("Age 25 and 60:")
print(age_25_and_60)
print("Person living in USA:")
print(living_in_USA)
print("Person older than 18 and younger than 40:")
print(age_filter)Code explanation
Line 11: We use the
locaccessor and pass the conditionperson_data['Age'] > 25in the square brackets[].Line 12: We use the
querymethod and pass the conditionAge > 25 and Country == "Germany"in the method.Line 13: We use the
isin()method and pass theagelist. The result of theisin()method is stored in theage_25_and_60variable.Line 14: We use the
person_datavariable and apply the filterperson_data["Country"] == "USA"in the square brackets[]. The result of applying the filter is stored in theliving_in_USAvariable.Line 15: We again use the
person_datavariable and apply the filter(person_data["Age"] > 18) & (person_data["Age"] < 40)inside the square brackets[]. The result of applying the filter is stored in theage_filtervariable.Lines 17–30: We use the
printstatement and print out the filtered data.
Conclusion
Pandas provide us with the capability of filtering out DataFrame values by applying filters on the column values. This can help us to extract information and get insights from the data.
Free Resources