Search⌘ K
AI Features

Performing Filtering

Explore multiple techniques to filter pandas DataFrames in Python, including boolean indexing, loc, query, and isin methods. Learn how to select rows based on conditions, such as filtering by country names, to prepare data effectively for analysis.

Method 1: Boolean indexing

We use boolean indexing to filter a DataFrame by creating a boolean array that specifies the criteria we want to use to select the rows. Then, we use that boolean array to index the DataFrame and select only the rows that meet the criteria.

In the following example, we'll use boolean indexing to select only the rows in the DataFrame where the value in the COUNTRY column is UNITED KINGDOM.

C++

Let’s review the code line by line:

  • Lines 1–3: We import the pandas library, load the dataset, and set to display all columns in the DataFrame.

  • Line 4: We create a boolean array called mask that returns True for rows in the employees_df DataFrame where the value in the COUNTRY column is UNITED KINGDOM and returns False for all other rows.

  • Line 5: We then use the boolean array to index the employees_df DataFrame and select only the rows that meet the criteria, storing them in a new DataFrame called filtered_df.

  • Line 6: We print the filtered_df DataFrame to see the filtered records.

Method 2: The loc() method

To filter a DataFrame using the loc() method, we specify the row and column labels we want to include in the resulting DataFrame. Like we did earlier, we'll filter records with the COUNTRY column value UNITED KINGDOM using the loc() method.

C++

Let’s review the code line by line:

  • Lines 1–3: We import the pandas library, load the dataset, and set to display all the DataFrame columns.

  • Line 4: We use the loc() method to filter the employees_df DataFrame based on the values in the COUNTRY column by specifying that we only want to include rows where the value in the COUNTRY column is UNITED KINGDOM. We store the resulting records in a new DataFrame called filtered_df.

  • Line 5: We print filtered_df to see the filtered records.

Method 3: The query() method

To filter a DataFrame using the query() method, we pass it a string containing the condition we want to use to filter the DataFrame.

C++

Let’s review the code line by line:

  • Lines 1–3: We import the pandas library, load the dataset, and display all the DataFrame columns.

  • Line 4: We use the query() method to filter the employees_df DataFrame based on the values in the COUNTRY column and specify that we only want to include rows where the value in the COUNTRY column is UNITED KINGDOM. We store the filtered records in a new DataFrame called filtered_df.

  • Line 5: We print the filtered_df to see the filtered records.

Method 4: The isin() method

To use the isin() method to filter for records in a DataFrame, we pass a list of values we want to filter to the isin() method.

C++
import pandas as pd
employees_df = pd.read_csv('employees.csv')
pd.set_option('display.max_columns', None)
filtered_df = employees_df[employees_df["COUNTRY"].isin(["UNITED KINGDOM", "MEXICO"])]
print(filtered_df)

Let’s review the code line by line:

  • Lines 1–3: We import the pandas library, load the dataset, and display all the DataFrame columns.

  • Line 4: We then use the isin() method to filter the employees_df DataFrame based on the values in the COUNTRY column and specify that we only want to include rows where the value in the COUNTRY column is UNITED KINGDOM or MEXICO. We then store the filtered records in a new DataFrame filtered_df.

  • Line 5: We print the filtered_df to see the filtered records.